Ingo Schwarze [Thu, 15 Oct 2015 22:45:43 +0000 (22:45 +0000)]
Simplify the part of args() that is handling .Bl -column phrases:
Delete manual "Ta" handling because macro handling should
not be done in an argument parser but should be left to the
macro parsers, which exist anyway and work well.
No functional change, minus 40 lines of code.
Confusing and redundant code found while investigating
an old bug report from tim@.
Ingo Schwarze [Thu, 15 Oct 2015 22:27:24 +0000 (22:27 +0000)]
When blk_full() handles an .It line in .Bl -column and indirectly
calls phrase_ta() to handle a .Ta child macro, advance the body
pointer accordingly, such that a subsequent tab character rewinds
the right body block and doesn't fail an assertion. That happened
when there was nothing between the .Ta and the tab character.
Bug reported by tim@ some time ago.
Ingo Schwarze [Tue, 13 Oct 2015 23:30:50 +0000 (23:30 +0000)]
Reject the escape sequences \[uD800] to \[uDFFF] in the parser.
These surrogates are not valid Unicode codepoints,
so treat them just like any other undefined character escapes:
Warn about them and do not produce output.
Issue noticed while talking to stsp@, semarie@, and bentley@.
Ingo Schwarze [Tue, 13 Oct 2015 22:59:54 +0000 (22:59 +0000)]
Major character table cleanup:
* Use ohash(3) rather than a hand-rolled hash table.
* Make the character table static in the chars.c module:
There is no need to pass a pointer around, we most certainly
never want to use two different character tables concurrently.
* No need to keep the characters in a separate file chars.in;
that merely encourages downstream porters to mess with them.
* Sort the characters to agree with the mandoc_chars(7) manual page.
* Specify Unicode codepoints in hex, not decimal (that's the detail
that originally triggered this patch).
No functional change, minus 100 LOC, and i don't see a performance change.
Ingo Schwarze [Tue, 13 Oct 2015 15:53:05 +0000 (15:53 +0000)]
Reduce the amount of code by moving the three copies of the ohash
callback functions into one common place, preparing for the use of
ohash for some additional purposes. No functional change.
Ingo Schwarze [Mon, 12 Oct 2015 21:26:02 +0000 (21:26 +0000)]
Delete an assignment that is unconditionally overwritten two lines later;
found by Svyatoslav Mishyn <juef at openmailbox dot org>
with the clang static analyzer.
Ingo Schwarze [Mon, 12 Oct 2015 21:09:54 +0000 (21:09 +0000)]
Check the right pointer against NULL;
fixing a pasto introduced in the previous commit;
found by Svyatoslav Mishyn <juef at openmailbox dot org> with cppcheck.
Ingo Schwarze [Mon, 12 Oct 2015 15:29:35 +0000 (15:29 +0000)]
Use "-" rather than "\(hy" for the heads of .Bl -dash and -hyphen lists.
In UTF-8 output, that renders as ASCII HYPHEN-MINUS (U+002D)
rather than HYPHEN (U+2010), which looks better and matches groff.
In ASCII output, it makes no difference.
Suggested by naddy@.
Ingo Schwarze [Mon, 12 Oct 2015 00:32:55 +0000 (00:32 +0000)]
Clear dform and dsec when exiting a first-level directory in treescan().
Fixes a segfault reported by bentley@.
While here, do some style cleanup in the same function.
Ingo Schwarze [Mon, 12 Oct 2015 00:08:15 +0000 (00:08 +0000)]
To make the code more readable, delete 283 /* FALLTHROUGH */ comments
that were right between two adjacent case statement. Keep only
those 24 where the first case actually executes some code before
falling through to the next case.
Ingo Schwarze [Sun, 11 Oct 2015 22:00:52 +0000 (22:00 +0000)]
Drop tags containing a blank character:
They don't work, they break other tags in weird ways, and even
if they could be made to work, they would be mostly useless.
Issue reported by naddy@, thanks.
Ingo Schwarze [Sun, 11 Oct 2015 21:12:54 +0000 (21:12 +0000)]
Finally use __progname, err(3) and warn(3).
That's more readable and less error-prone than fumbling around
with argv[0], fprintf(3), strerror(3), perror(3), and exit(3).
It's a bad idea to boycott good interfaces merely because standards
committees ignore them. Instead, let's provide compatibility modules
for archaic systems (like commercial Solaris) that still don't have
them. The compat module has an UCB Copyright (c) 1993...
Fix multiple aspects of SYNOPSIS .Nm formatting:
* Don't break lines before non-block .Nm elements.
* Use proper <b> markup for the heads of .Nm blocks.
* Make the width measurements work by doing them on the head children.
Trailing whitespace is significant when determining the width of a tag
in mdoc(7) .Bl -tag and man(7) .TP, but not in man(7) .IP.
Quirk reported by Jan Stary <hans at stare dot cz> on ports@.
Remove the warning about children of .Vt blocks because actually,
.Vt type global_variable No = Dv defined_constant ;
is the best way to specify in the SYNOPSIS how a global variable
is initialized in the rare case where that matters.
Issue noticed by jmc@.
Fill mode changes don't break next-line scope in all cases,
in particular not for tagged paragraphs.
Issue found by Christian Neukirchen <chneukirchen at gmail dot com>
in the exiv2(1) manual page.
Recommend an unambiguous escape for minus signs instead of \-.
Historically, \- was used in troff for three cases: flags/pathnames,
en dashes, and minus signs. mandoc_char(7) currently recommends it
for minus signs, recommends \(en for en dashes, and doesn't mention
flags/pathnames.
In the old days, nroff rendered \- as ASCII '-', and troff rendered
it as en dash/minus (which were visually indistinguishable).
In Unicode, en dashes and minus signs are semantically distinct and
encoded differently (U+2013 for en dash, U+2212 for minus), and
often rendered differently too. Meanwhile ASCII '-' has been renamed
"hyphen-minus" and fonts typically render it closest to a hyphen, not
a minus.
There is very little consistency across roff implementations and output
formats for what Unicode character \- corresponds to. So at least for
minus signs, change the recommendation to the unambiguous \(mi escape.
from bentley@, ok jmc@ (after reams of discussion)
Ingo Schwarze [Sun, 30 Aug 2015 21:10:56 +0000 (21:10 +0000)]
Drop leading, internal, and trailing blank characters in \o (overstrike)
escape sequences; that's cleaner for all output modes, and it's required
to prevent the PostScript/PDF formatter from dying on assertions.
Bug found by jsg@ with afl.
Ingo Schwarze [Sat, 29 Aug 2015 23:56:01 +0000 (23:56 +0000)]
If we have to reparse the text line because we spring an input line trap,
we must not escape breakable hyphens yet, or mparse_buf_r() in read.c
will complain and replace the escaped hyphens with question marks.
Bug found in ocserv(8) following a report from Kurt Jaeger <pi at FreeBSD>.
Ingo Schwarze [Sat, 29 Aug 2015 22:40:05 +0000 (22:40 +0000)]
Parse and ignore the escape sequences \, and \/ (italic corrections).
Actually using these is very stupid because they are groff extensions
and other roff(7) implementations typically print unintended characters
at the places where they are used.
Nevertheless, some manuals contain them, for example ocserv(8).
Problem reported by Kurt Jaeger <pi at FreeBSD>.
Ingo Schwarze [Sat, 29 Aug 2015 21:37:20 +0000 (21:37 +0000)]
Implement the escape sequence \\$*, expanding to all arguments
of the current user-defined macro.
This is another missing feature required for ocserv(8).
Problem reported by Kurt Jaeger <pi at FreeBSD>.
Ingo Schwarze [Sat, 29 Aug 2015 20:26:04 +0000 (20:26 +0000)]
Minimal implementation of the read-only number register \n(.$
which returns the number of arguments of the current macro.
This is one of the missing features required for ocserv(8).
Problem reported by Kurt Jaeger <pi at FreeBSD>.
Ingo Schwarze [Sat, 29 Aug 2015 15:28:13 +0000 (15:28 +0000)]
Including <ohash.h> requires including <stdint.h> before,
and "config.h" was missing as well.
Patch from Svyatoslav Mishyn <juef and openmailbox dot org>, Crux Linux.
Remove the hack of scrolling forward and backward with +G1G that
many (jmc@, millert@, espie@, deraadt@) considered revolting.
Instead, when using a pager, since we are using a temporary file
for tags anyway, use another temporary file for the formatted
page(s), as suggested by millert@ and similar to what the traditional
BSD man(1) did, except that we use only one single temporary output
file rather than one for each formatted manual page, such that
searching (both with / and :t) works across all the displayed files.
Simplify and make tag_put() more efficient by integrating tag_get()
into it and by only handling NUL-terminated strings.
Minus 25 lines of code, no functional change.
When creation of the temporary tags file fails, call the pager
without the -T option, because otherwise the pager won't even start.
Fixing a bug reported by jca@.
While here, shorten the code by two lines
and delete one internal interface function.
Do not fork and exec gunzip(1), just link with libz instead.
As discussed with deraadt@, that's cleaner and will help tame(2).
Something like this was also suggested earlier by bapt at FreeBSD.
Minus 50 lines of code, deleting one interface function (mparse_wait),
no functional change intended.
Insist that manual page file name extensions must begin with a digit,
lest pkg.conf(5) be shown when pkg(5) is asked for;
issue reported by Michael Reed <m dot reed at mykolab dot com>.
Initial, still somewhat experimental implementation to leverage
less(1) -T and :t ctags(1)-like functionality to jump to the
definitions of various terms inside manual pages.
To be polished in the tree, so bear with me and report issues.
Technically, if less(1) is used as a pager, information is collected
by the mdoc(7) terminal formatter, first stored using the ohash
library, then ultimately written to a temporary file which is passed
to less via -T. No change intended for other output formatters or
when running without a pager.
Based on an idea from Kristaps using feedback from many, in particular
phessler@ nicm@ millert@ halex@ doug@ kspillner@ deraadt@.
Fix the "depend" target and regenerate Makefile.depend:
* do not process the test-*.c files, they are not built via make
* add the missing compat_stringlist.c and soelim.c
* read.c now uses roff_int.h
* roff.c no longer uses libmdoc.h
Ingo Schwarze [Thu, 7 May 2015 12:08:13 +0000 (12:08 +0000)]
Do not let the -m option or MANPATH with leading, trailing, or double
colon override the default manpath, let them add to the default manpath.
Only override the default manpath by the -M option, by MANPATH without
leading, trailing, or double colon, or by "manpath" in man.conf(5).
Problem reported by Jan Stary <hans at stare dot cz>.
Patch OK'ed by millert@.
Ingo Schwarze [Fri, 1 May 2015 16:58:33 +0000 (16:58 +0000)]
mdoc_valid_post() may indirectly call roff_node_unlink() which may
set ROFF_NEXT_CHILD, which is desirable for the final call to
mdoc_valid_post() - in case the target itself gets deleted, the
parse point may need this adjustment - but not for the intermediate
calls - if intermediate nodes get deleted, that mustn't clobber the
parse point. So move setting ROFF_NEXT_SIBLING to the proper place
in rew_last().
This fixes the assertion failure in jsg@'s afl test case 108/Apr27.
Ingo Schwarze [Fri, 1 May 2015 16:02:47 +0000 (16:02 +0000)]
Setting the "last" member of struct roff_node was done at an extremely
weird place. Move it to the obviously correct place.
Surprisingly, this didn't cause any misformatting in the test suite
or in any base system manuals, but i cannot believe the code was
really correct for all conceivable input, and it would be very hard
to verify. At the very least, it cannot have worked for man(7).
Ingo Schwarze [Fri, 1 May 2015 15:27:54 +0000 (15:27 +0000)]
Minor bug fix: When .Pp rewinds .Nm, rewind the whole block,
not just the body. In some unusual edge cases, this caused
the .Pp to become a sibling of the .Nm body inside the .Nm block.
If a block body gets broken, that's no good reason to extend the
scope of the end macro. Instead, only keep the tail scope open if
the end macro macro calls an explicit macro and actually breaks
that. This corrects syntax tree structure and fixes an assertion
found by jsg@ with afl (test case 098/Apr27).
Replace the kludge for the \z escape sequence by an actual
implementation. As a side effect, minus ten lines of code.
As another side effect, this also fixes the assertion failure that
used to be triggered by "\z\o'ab'c" at the beginning of an output
line, found by jsg@ with afl (test case 022/Apr27).
Do not mark a block with the MDOC_BROKEN flag if it merely contains
a mismatching explicit end macro without actually being broken.
Avoids a subsequent upward search for the non-existent breaker
ending up in a NULL pointer access; afl test case 005/Apr27 from jsg@.
When the last line of a table layout turns out to be empty, it is deleted.
Do not just free the struct tbl_row but also make sure that no pointer
to it remains. Fixing a use after free found by jsg@ with afl.
Unify mdoc_deroff() and man_deroff() into a common function deroff().
No functional change except that for mdoc(7), it now skips leading
escape sequences just like it already did for man(7).
Escape sequences rarely occur in mdoc(7) code and if they do,
skipping them is an improvement in this context.
Minus 30 lines of code.
Avoid out-of-bounds read access before the beginning of the
mdoc_macros[] array. This sometimes prevented proper warnings
about text nodes preceding the first section header.
More than one data field may follow T} on the same input line.
Issue found by Christian Neukirchen <chneukirchen at gmail dot com>
in the socket(2) manual on Linux.
Also fixes major rendering bugs (including partial loss of content)
in XkbChangeControls(3), XkbFreeClientMap(3), XkbGetMap(3),
XkbKeyNumGroups(3), and XkbSetMap(3).
If an explicit line break request (.br or .sp) occurs within an .HP block,
the next line doesn't hang, but is simply indented.
Issue found by Christian Neukirchen <chneukirchen at gmail dot com>
in the dmsetup(8) manual on Linux.
This patch also improves the indentation of XDGA(3) and XrmGetResource(3).
Unify trickier node handling functions.
* man_elem_alloc() -> roff_elem_alloc()
* man_block_alloc() -> roff_block_alloc()
The functions mdoc_elem_alloc() and mdoc_block_alloc() remain for
now because they need to do mdoc(7)-specific argument processing.
Decouple the token code for "no request or macro" from the individual
high-level parsers to allow further unification of functions that
only need to recognize this code, but that don't care about different
high-level macrosets beyond that.
Unify node handling functions:
* node_alloc() for mdoc and man_node_alloc() -> roff_node_alloc()
* node_append() for mdoc and man_node_append() -> roff_node_append()
* mdoc_head_alloc() and man_head_alloc() -> roff_head_alloc()
* mdoc_body_alloc() and man_body_alloc() -> roff_body_alloc()
* mdoc_node_unlink() and man_node_unlink() -> roff_node_unlink()
* mdoc_node_free() and man_node_free() -> roff_node_free()
* mdoc_node_delete() and man_node_delete() -> roff_node_delete()
Minus 130 lines of code, no functional change.
Delete the wrapper functions mdoc_meta(), man_meta(), mdoc_node(),
man_node() from the mandoc(3) semi-public interface and the internal
wrapper functions print_mdoc() and print_man() from the HTML formatters.
Minus 60 lines of code, no functional change.
Unify {mdoc,man}_{alloc,reset,free}() into roff_man_{alloc,reset,free}().
Minus 80 lines of code, no functional change.
Written on the train from Koeln to Wolfsburg returning from p2k15.
Move mdoc_hash_init() and man_hash_init() to libmandoc.h
and call them from mparse_alloc() and choose_parser(),
preparing unified allocation of struct roff_man.
Profit from the unified struct roff_man and reduce the number of
arguments of mparse_result() by one. No functional change.
Written on the ICE Bruxelles-Koeln on the way back from p2k15.
Replace the structs mdoc and man by a unified struct roff_man.
Almost completely mechanical, no functional change.
Written on the train from Exeter to London returning from p2k15.