aboutsummaryrefslogtreecommitdiffstatshomepage
path: root/mdoc_validate.c
Commit message (Collapse)AuthorAgeFilesLines
* store the operating system name obtained from uname(3) in the adequateIngo Schwarze2021-10-041-7/+6
| | | | | | struct together with similar state date rather than in a function-scope static variable, such that it can be free(3)d in roff_man_free(); no functional change
* Support auto-tagging for ".It Va".Ingo Schwarze2021-07-181-2/+2
| | | | | | | | | This combination is somewhat rare because few libraries expose so many global variables that they need a list to enumerate them, but when the idiom does occur, tagging the variable names is generally useful. For example, this helps awk(1), dc(1), make(1), rc.subr(8), ... Missing feature reported and patch reviewed, tested, and OK'ed by kn@.
* Promote section headers that can can be used unmodified as fragmentIngo Schwarze2020-10-301-2/+2
| | | | | | identifiers from TAG_WEAK to TAG_STRONG, such that for example ...#DESCRIPTION always works. Suggested by Aman Verma on the discuss@ list.
* While we do not recommend the idiom ".Fl Fl long" for long optionsIngo Schwarze2020-04-261-2/+26
| | | | | | | | | | | because it is an abuse of semantic macros for device-specific presentational effects, this idiom is so widespread that it makes sense to convert it to the recommended ".Fl \-long" during the validation phase. For example, this improves HTML formatting in pages where authors have used the dubious .Fl Fl. Feature suggested by Steffen Nurpmeso <steffen at sdaoden dot eu> on freebsd-hackers.
* provide a STYLE message when mandoc knows the file name and the extensionIngo Schwarze2020-04-241-2/+8
| | | | | disagrees with the section number given in the .Dt or .TH macro; feature suggested and patch tested by jmc@
* When a .Tg is attached to a paragraph, attach the permalinkIngo Schwarze2020-04-181-2/+2
| | | | to the first word, or the first few words if they are short.
* Use a separate node->tag attribute rather than abusing the node->stringIngo Schwarze2020-04-081-3/+3
| | | | | attribute for the purpose. No functional change intended. The purpose is to make it possible to later attach tags to text nodes.
* Support manual tagging of .Pp, .Bd, .D1, .Dl, .Bl, and .It.Ingo Schwarze2020-04-061-7/+41
| | | | | | In HTML output, improve the logic for writing inside permalinks: skip them when there is no child content or when there is a risk that the children might contain flow content.
* Copy tagged strings before marking hyphens as breakable.Ingo Schwarze2020-04-021-4/+8
| | | | For example, this makes ":tCo-processes" work in ksh(1).
* Just like we are already doing it in HTML output, automatically tagIngo Schwarze2020-04-011-24/+38
| | | | | | | section and subsection headers in terminal output, too. Even though admittedly, commands like "/SEE" and "/ Subsec" work, too, there is no downside, and besides, with the recent improvements in the tagging framework, implementation cost is negligible.
* Split tagging into a validation part including prioritizationIngo Schwarze2020-03-131-74/+126
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | in tag.{h,c} and {mdoc,man}_validate.c and into a formatting part including command line argument checking in term_tag.{h,c}, html.c, and {mdoc|man}_{term|html}.c. Immediate functional benefits include: * Improved prioritization of automatic tags for .Em and .Sy. * Avoiding bogus automatic tags when .Em, .Fn, or .Sy are explicitly tagged. * Explicit tagging of .Er and .Fl now works in HTML output. * Automatic tagging of .IP and .TP now works in HTML output. But mainly, this patch provides clean earth to build further improvements on. Technical changes: * Main program: Write a tag file for ASCII and UTF-8 output only. * All formatters: There is no more need to delay writing the tags. * mdoc(7)+man(7) formatters: No more need for elaborate syntax tree inspection. * HTML formatter: If available, use the "string" attribute as the tag. * HTML formatter: New function to write permalinks, to reduce code duplication. Style cleanup in the vicinity while here: * mdoc(7) terminal formatter: To set up bold font for children, defer to termp_bold_pre() rather than calling term_fontpush() manually. * mdoc(7) terminal formatter: Garbage collect some duplicate functions. * mdoc(7) HTML formatter: Unify <code> handling, delete redundant functions. * Where possible, use switch statements rather than if cascades. * Get rid of some more Yoda notation. The necessity for such changes was first discussed with kn@, but i didn't bother him with a request to review the resulting -673/+782 line patch.
* Fully support explicit tagging of .Sh and .Ss.Ingo Schwarze2020-02-271-6/+45
| | | | | | | | | | | | | | | | | | | | | | | | | This fixes the offset of two lines in terminal output and this improves HTML output by putting the id= attribute and <a> element into the respective <h1> or <h2> element rather than writing an additional <mark> element. To that end, introduce node flags NODE_ID (to make the node a link target, for example by writing an HTML id= attribute or by calling tag_put()) and NODE_HREF (to make the node a link source, used only in HTML output, used only to write an <a class="permalink"> element). In particular: * In the validator, generalize the concept of the "next node" such that it also works before .Sh and .Ss. * If the first argument of .Tg is empty, don't forget to complain if there are additional arguments, which will be ignored. * In the terminal formatter, support writing of explicit tags for all kinds of nodes, not just for .Tg. * In deroff(), allow nodes to have an explicit string representation even when they aren't text nodes. Use this for explicitly tagged section headers. Suprisingly, this is sufficient to make HTML output work, without explicit code changes in the HTML formatter. * In syntax tree output, display NODE_ID and NODE_HREF.
* Introduce the concept of nodes that are semantically transparent:Ingo Schwarze2020-02-271-86/+70
| | | | | | | | | | | | | | they are skipped when looking for previous or following high-level macros. Examples include roff(7) .ft, .ll, and .ta, mdoc(7) .Sm and .Tg, and man(7) .DT and .PD. Use this concept for a variety of improved decisions in various validators and formatters. While here, * remove a few const qualifiers on struct arguments that caused trouble; * get rid of some more Yoda notation in the vicinity; * and apply some other stylistic improvements in the vicinity. I found this class of issues while considering .Tg patches from kn@.
* Introduce a new mdoc(7) macro .Tg ("tag") to explicitly mark a placeIngo Schwarze2020-01-191-2/+39
| | | | | | | | | | | | | | | as defining a term. Please only use it when automatic tagging does not work. Manual page authors will not be required to add the new macro; using it remains optional. HTML output is still rudimentary in this version and will be polished later. Thanks to kn@ for reminding me that i have been considering since BSDCan 2014 whether something like this might be useful. Given that possibilities of making automatic tagging better are running out and there are still several situations where automatic tagging cannot do the job, i think the time is now ripe. Feedback and no objection from millert@; OK espie@ inoguchi@ kn@.
* Align to the new, sane behaviour of the groff_mdoc(7) .Dd macro:Ingo Schwarze2020-01-191-8/+7
| | | | | | | | without an argument, use the empty string, and always concatenate all arguments, no matter their number. This allows reducing the number of arguments of mandoc_normdate() and some other simplifications, at the same time polishing some error messages by adding the name of the macro in question.
* Improve validation of function names:Ingo Schwarze2019-09-131-6/+12
| | | | | | 1. Relax checking to accept function types of the form "ret_type (fname)(args)" (suggested by Yuri Pankov <yuripv dot net>). 2. Tighten checking to require the closing parenthesis.
* Fix mandoc_normdate() and the way it is used.Ingo Schwarze2019-06-271-16/+4
| | | | | | | | | | In the past, it could return NULL but the calling code wasn't prepared to handle that. Make sure it always returns an allocated string. While here, simplify the code by handling the "quick" attribute inside mandoc_normdate() rather than at multiple callsites. Triggered by deraadt@ pointing out that snprintf(3) error handling was incomplete in time2a().
* Contrary to what the NetBSD attribute(3) manual page suggests,Ingo Schwarze2019-03-131-3/+3
| | | | | | | | | | | | | using __dead instead of __attribute__((__noreturn__)) actually hinders portability rather than helping it. Given that mandoc already uses __attribute__ in several files and that in the portable version, ./configure already contains rudimentary support for ignoring it on platforms that do not support it, use __attribute__ directly. This is expected to fix build failures that Stephen Gregoratto <dev at sgregoratto dot me> reported from Arch and Debian Linux.
* mark check_abort() and post_abort() as __dead;Ingo Schwarze2019-03-111-3/+3
| | | | based on a patch by Christos@ Zoulas at NetBSD
* When the -S option is given to man(1) and the requested manual pageIngo Schwarze2019-03-041-40/+14
| | | | | | | | | | | | name is not found and the requested architecture is unknown, complain about the architecture rather than about the manual page name: $ man -S vax cpu man: Unknown architecture "vax". $ man -S sparc64 foobar man: No entry for foobar in the manual. Friendlier error message suggested by jmc@, who also OK'ed the patch.
* Fix the last straggler where the struct roff_node "line" memberIngo Schwarze2019-03-041-2/+2
| | | | | was abused to detect an input line break; instead, use the NODE_LINE flag to improve robustness.
* Use the new flag NODE_NOFILL in the validators, which is sometimesIngo Schwarze2018-12-311-5/+3
| | | | | | simpler and always more robust. In particular, move the nesting warnings for .EX and .EE from man_state(), where they were misplaced, to the man(7) validator.
* Cleanup, no functional change:Ingo Schwarze2018-12-311-3/+3
| | | | | | Use the new parser flag ROFF_NOFILL in the mdoc(7) parser, too, instead of the old MDOC_LITERAL, which was an alias for the former MAN_LITERAL.
* Cleanup, minus 15 LOC, no functional change:Ingo Schwarze2018-12-311-3/+3
| | | | | | | | | Simplify the way the man(7) and mdoc(7) validators are called. Reset the parser state with a common function before calling them. There is no need to again reset the parser state afterwards, the parsers are no longer used after validation. This allows getting rid of man_node_validate() and mdoc_node_validate() as separate functions.
* Cleanup, no functional change:Ingo Schwarze2018-12-301-3/+3
| | | | | | | | | | | | | | The struct roff_man used to be a bad mixture of internal parser state and public parsing results. Move the public results to the parsing result struct roff_meta, which is already public. Move the rest of struct roff_man to the parser-internal header roff_int.h. Since the validators need access to the parser state, call them from the top level parser during mparse_result() rather than from the main programs, also reducing code duplication. This keeps parser internal state out of thee main programs (five in mandoc portable) and out of eight formatters.
* Almost mechanical diff to remove the "struct mparse *" argumentIngo Schwarze2018-12-141-225/+160
| | | | | | | | from mandoc_msg(), where it is no longer used. While here, rename mandoc_vmsg() to mandoc_msg() and retire the old version: There is really no point in having another function merely to save "%s" in a few places. Minus 140 lines of code.
* Clean up the validation of .Pp, .PP, .sp, and .br. Make sure allIngo Schwarze2018-12-041-41/+14
| | | | | | | | | | | | | | combinations are handled, and are handled in a systematic manner. This resolves some erratic duplicate handling, handles a number of missing cases, and improves diagnostics in various respects. Move validation of .br and .sp to the roff validation module rather than doing that twice in the mdoc and man validation modules. Move the node relinking function to the roff library where it belongs. In validation functions, only look at the node itself, at previous nodes, and at descendants, not at following nodes or ancestors, such that only nodes are inspected which are already validated.
* In the validators, translate obsolete macro aliases (Lp, Ot, LP, P)Ingo Schwarze2018-12-031-15/+43
| | | | | | to the standard forms (Pp, Ft, PP) up front, such that later code does not need to look for the obsolete versions. This reduces the risk of incomplete handling.
* Remove more pointer arithmetic passing via regions outside the arrayIngo Schwarze2018-08-171-2/+2
| | | | | that is undefined according to the C standard. Robert Elz <kre at munnari dot oz dot au> pointed out i wasn't quite done yet.
* Do not calculate a pointer to a memory location before the beginning ofIngo Schwarze2018-08-161-4/+3
| | | | | | a static array. Christos Zoulas, Robert Elz, and Andreas Gustafsson point out that is undefined behaviour by the C standard even if we never access the pointer.
* Fix an off-by-one string read access that could happen if an emptyIngo Schwarze2018-08-011-3/+2
| | | | | string argument preceded a string argument beginning with "--". Found by Leah Neukirchen <leah at vuxu dot org> with -Wpointer-compare.
* Avoid a read access one byte beyond the end of an allocated stringIngo Schwarze2018-08-011-2/+2
| | | | | which occurred in situations like ".Fl a Cm --"; found by Leah Neukirchen <leah at vuxu dot org> with valgrind on Void Linux.
* preserve comments before .Dd when converting mdoc(7) to man(7)Ingo Schwarze2018-04-111-3/+6
| | | | with mandoc -Tman; suggested by Thomas Klausner <wiz at NetBSD>
* use the portable \(lq and \(rq internally rather than \(Lq and \(RqIngo Schwarze2018-04-051-3/+3
|
* Ouch, fix previous: In the edge case of a single-character stringIngo Schwarze2018-03-161-2/+3
| | | | | containing nothing but a single hyphen, the pointer got incremented twice at one point, causing a read overrun found by naddy@.
* Style message about bad input encoding of em-dashes as -- instead of \(em.Ingo Schwarze2018-03-161-9/+66
| | | | Suggested by Thomas Klausner <wiz at NetBSD>; discussed with jmc@.
* Delete the "no blank before trailing delimiter" check from theIngo Schwarze2018-02-061-10/+9
| | | | | partial explicit macros. Leah Neukirchen <leah at vuxu dot org> rightfully points out that the check makes no sense for these macros.
* Do not segfault when there are two .Dt macros, the first withoutIngo Schwarze2017-09-121-2/+5
| | | | | an architecture argument and the second with an invalid one. Bug found by jsg@ with afl(1).
* No longer use names that only occur in the SYNOPSIS section as namesIngo Schwarze2017-08-021-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | | for man(1) lookup. For OpenBSD base and Xenocara, that functionality was never intended to be required, and i just fixed the last handful of offenders using it - not counting the horribly ill-designed interfaces engine(3) and lh_new(3) which are impossible to properly document in the first place. Of course, apropos(1) and whatis(1) continue to use SYNOPSIS .Nm, .Fn, and .Fo macros, so "man -k ENGINE_get_load_privkey_function" still works. This change also gets rid of a few bogus warnings "cross reference to self" which actually are *not* to self, like in yp(8). This former functionality was intended to help third-party software in the ports tree and on non-OpenBSD systems containing manual pages with incomplete or corrupt NAME sections. But it turned out it did more harm than good, and caused more confusion than relief, specifically for third party manuals and for maintainers of mandoc-portable on other operating systems. So kill it. Problems reported, among others, by Yuri Pankov (illumos). OK jmc@
* Fix an out of bounds read access to a constant array that causedIngo Schwarze2017-07-311-2/+2
| | | | | | | segfaults on certain hardened versions of glibc. Triggered by .sp or blank lines right before .SS or .SH, or before the first .Sh. Found the hard way by Dr. Markus Waldner on Debian and by Leah Neukirchen on Void Linux.
* correctly handle letters in .Nx arguments; improves for exampleIngo Schwarze2017-07-201-1/+16
| | | | getpgid(2), ac(8), ldconfig(8), mount_ffs(8), sa(8), ttyflags(8), ...
* If -column, -diag, -inset, -item, or -ohang lists have a -width,Ingo Schwarze2017-07-151-5/+6
| | | | | don't just talk about ignoring it, actually do ignore it. No change for terminal output, improves HTML output.
* report trailing delimiters after macros where they are usually a mistake;Ingo Schwarze2017-07-031-49/+95
| | | | the idea came up in a discussion with Thomas Klausner <wiz at NetBSD>
* add warning "cross reference to self"; inspired by mdoclintIngo Schwarze2017-07-021-3/+13
|
* Basic reporting of .Xrs to manual pages that don't existIngo Schwarze2017-07-011-2/+6
| | | | | | | | | | | | in the base system, inspired by mdoclint(1). We are able to do this because (1) the -mdoc parser, the -Tlint validator, and the man(1) manual page lookup code are all in the same program and (2) the mandoc.db(5) database format allows fast lookup. Feedback from, previous versions tested by, and OK jmc@. A few features will be added to this in the tree, step by step.
* warn about some non-portable idioms in .Bl -column;Ingo Schwarze2017-06-291-5/+20
| | | | triggered by a question from Yuri Pankov (illumos)
* warn about .Ns macros that have no effect because they are followedIngo Schwarze2017-06-271-3/+6
| | | | by an isolated closing delimiter; inspired by mdoclint
* Catch typos in .Sh names; suggested by jmc@.Ingo Schwarze2017-06-251-2/+63
| | | | | | I'm using a very simple, linear time / zero space fuzzy string matching heuristic rather than a full Levenshtein metric, to keep the code both simple and fast.
* operating system dependent message about unknown architecture;Ingo Schwarze2017-06-241-1/+40
| | | | inspired by mdoclint
* in the base system, suggest leaving .Os blank; inspired by mdoclintIngo Schwarze2017-06-241-1/+8
|