Ingo Schwarze [Thu, 16 Feb 2017 10:56:07 +0000 (10:56 +0000)]
Fix rev. 1.280: -O syntax is different in default apropos(1) output
mode and in other output modes, so do not error out prematurely.
Also sort local variables in main() while here.
Ingo Schwarze [Thu, 16 Feb 2017 09:47:31 +0000 (09:47 +0000)]
Fix block scoping error if an explicit block is broken by two
implicit blocks (.Aq Bq Po .Pc) that left the outer breaker open
and could in exceptional cases, like between .Bl and .It, cause
tree corruption leading to NULL dereference.
Found by tb@ with afl(1).
While here, do not mark intermediate ENDBODY markers as broken.
Ingo Schwarze [Wed, 15 Feb 2017 15:58:46 +0000 (15:58 +0000)]
Style improvement, no functional change.
As reported by Yuri Pankov, some versions of GCC whine that "tmp"
might be used uninitialized in fts_open(3). Clearly, that cannot
actually happen, but explicitly setting it to NULL is safer anyway.
While here, rename the badly named variable "tmp" and make the
inner "if" easier to understand.
Ingo Schwarze [Wed, 15 Feb 2017 14:10:08 +0000 (14:10 +0000)]
Fix previous: I forgot that i had to change the convention how
a node is marked as "not a macro" when unifying the parsers.
Confirmed to work by Sevan Janiyan.
Ingo Schwarze [Sat, 11 Feb 2017 21:49:50 +0000 (21:49 +0000)]
Do not read one element past the end of the static const termacts array.
Bug found by Sevan Janiyan <venture37 at geeklan dot co dot uk>
who ran the OpenBSD mandoc test suite on Ubuntu on POWER8 (sic!)
and reported that mdoc/Sh/before.in failed in -Tman mode.
If that isn't power testing, i don't know...
Ingo Schwarze [Sat, 11 Feb 2017 17:53:33 +0000 (17:53 +0000)]
Disable three UTF-8 tests that expose bugs in wcwidth(3) in the
native C libraries of illumos, Oracle Solaris 11, and SunOS 5.10.
While it is useful to catch wcwidth(3) regressions on OpenBSD, the
purpose of the *portable* mandoc regression suite is not to check
the C library of the host system; that would just hide genuine
mandoc portability issues in the noise. The remaining UTF-8 tests
are still sufficient to establish that mandoc does the right thing.
Issues reported by Sevan Janiyan <venture37 at geeklan dot co dot uk>
after testing on OmniOS.
Ingo Schwarze [Sat, 11 Feb 2017 15:47:16 +0000 (15:47 +0000)]
Never look for broken blocks inside blocks that are already closed.
Fixes the last the of tree corruptions sometimes causing NULL dereference
reported by tb@; this one triggered in cases like: .Bl -column .It Pq Ta
Ingo Schwarze [Sat, 11 Feb 2017 14:11:17 +0000 (14:11 +0000)]
Do not prematurely close .Nd containing a broken child.
Fixes tree corruption leading to NULL dereference
in insane cases like .Oo Oo .Nd .Pq Oc .Oc Oc
found by tb@ with afl(1).
Ingo Schwarze [Sat, 11 Feb 2017 13:24:12 +0000 (13:24 +0000)]
Do not prematurely mark intermediate blocks as broken while scanning
backwards. Only do so when a block is found that is actually broken.
Logic error found while investigating crashes reported by tb@.
Ingo Schwarze [Fri, 10 Feb 2017 22:19:18 +0000 (22:19 +0000)]
For child macros of block-end macros, only scan backwards for pending
breakers unless the parent of the block is already closed. While
the scanning is needed in cases like ".Ac Bo" for broken Ao, it is
useless and crashy in cases like ".Ac Bc" for non-broken Ao.
This fixes a NULL pointer dereference that tb@ found with afl(1).
Ingo Schwarze [Fri, 10 Feb 2017 16:20:34 +0000 (16:20 +0000)]
In the SYNOPSIS, .Nm blocks can get broken if one of their children
gets broken. In that case, mark them as BROKEN and ENDED and make
sure they get closed out together with the child.
Fixes tree corruption leeding to a NULL dereference found by tb@
with afl(1) in: .Sh SYNOPSIS .Bl .Oo .Nm .Bk .Oc .It (where .Bk is
the child and .Oo is the breaker).
A simpler form of the same corruption (without crash) is visible in:
.Sh SYNOPSIS .Ao .Nm .Bo .Ac .Bc text
where the text ended up inside the .Nm (child .Bo, breaker .Ao).
Ingo Schwarze [Thu, 9 Feb 2017 20:53:33 +0000 (20:53 +0000)]
same as mandocdb.c rev. 1.196:
for portability, use (char *)NULL in execlp(3) as discussed on tech@
OpenBSD (didn't blow up anywhere yet, but better safe than sorry)
Ingo Schwarze [Thu, 9 Feb 2017 18:46:44 +0000 (18:46 +0000)]
Illumos doesn't have O_DIRECTORY. Work around that for now, may
fix it better after the 1.14.1 release. Portability issue reported
by Sevan Janiyan <venture37 at geeklan dot co dot uk>.
Ingo Schwarze [Mon, 6 Feb 2017 03:44:58 +0000 (03:44 +0000)]
The .Nm macro does not only use the default name when it has no
argument, but also when the first argument is a child macro.
Arcane issue found in the FreeBSD cxgbetool(8) manual that Baptiste
Daroussin <bapt at FreeBSD> sent me long ago for a different reason.
While solving this, switch to the new technique of doing text
production in the validator, reducing code duplication in the
formatters, which also makes -Ttree output clearer.
Ingo Schwarze [Sun, 5 Feb 2017 18:15:39 +0000 (18:15 +0000)]
Improve <table> syntax:
The <col> element can only appear inside <colgroup>, so use <colgroup>.
The <tbody> element is optional and useless, so don't use it.
Even if we would ever need <thead> or <tfoot>, <tbody> would still be
optional and useless; besides, we will likely never need <thead> or <tfoot>,
simply because our languages don't support such functionality.
Ingo Schwarze [Sat, 4 Feb 2017 11:58:09 +0000 (11:58 +0000)]
Do not fix the default indent for all subsequent files; some may use
a different macro language and hence require a different indent.
You can see the effect with "man -a 1 host hostname".
Ingo Schwarze [Fri, 3 Feb 2017 18:18:23 +0000 (18:18 +0000)]
Minor cleanup, no functional change:
We always have a roff parser, so mparse_free() does not need to check
for existence before freeing it.
Also arrange code in struct mparse, mparse_reset(), and mparse_free()
in the same order for readability.
Ingo Schwarze [Fri, 3 Feb 2017 17:56:59 +0000 (17:56 +0000)]
If an application parses multiple files with mparse_readfd(3) but
without using mparse_open(3) to open the files, and if one of the
files includes a gzip'ed file with .so, then the gzip flag remains
set and the next main file will be expected to be gzip'ed.
Fix this by clearing the gzip flag in mparse_reset(3).
Bug found and patch provided by Michael <Stapelberg at debian dot org>.
Ingo Schwarze [Mon, 30 Jan 2017 20:24:02 +0000 (20:24 +0000)]
Rework fill mode handling for -man -Thtml.
Basically, open <pre> whenever printing text in no-fill mode and it is
not already open, and close it whenever printing something that cannot
be inside <pre>.
This fixes a crash reported by Michael <Stapelberg at debian dot org>
in the French Linux chroot(2) manual and also improves rendering
for OpenBSD pages like DPMSGetTimeouts(3) and GLwDrawingArea(3).
These changes also permitted retiring struct mhtml.
Ingo Schwarze [Sat, 28 Jan 2017 23:30:08 +0000 (23:30 +0000)]
Add a warning "new sentence, new line".
This does not attempt to pinpoint each and every offender, but
instead tries very hard to avoid false positives: Currently, there
are only two false positives in the whole OpenBSD base system.
Only do this in mdoc(7), not in man(7), because manuals written
in man(7) typically have much worse problems than this.
OK jmc@ on a previous version of the patch
Ingo Schwarze [Sat, 28 Jan 2017 18:43:00 +0000 (18:43 +0000)]
.Bl -column with zero columns is legal, so don't segfalt on it.
Bug introduced in rev. 1.248 triggered for example in gssapi(3),
analyzed and reported by Michael <Stapelberg at debian dot org>.
Simplify the code a bit more while here.
Ingo Schwarze [Thu, 26 Jan 2017 18:28:18 +0000 (18:28 +0000)]
Fix -man -Thtml formatting after .nf (which has nothing to do
with "literal", by the way, it means "no fill"):
* Use <pre> such that whitespace is preserved.
* Preserve lines breaks.
* For font alternating macros, avoid node recursion which required
scary juggling with the fill state. Instead, simply print the text
children directly.
Missing feature first noticed by kristaps@ in 2011,
the again reported by afresh1@ in 2016,
and finally reported here: https://github.com/Debian/debiman/issues/21 ,
which i only found because of Shane Kerr's comment here:
https://plus.google.com/110314300533310775053/posts/H1eaw9Yskoc
Ingo Schwarze [Wed, 25 Jan 2017 02:14:43 +0000 (02:14 +0000)]
Improve HTML formatting of .Bl -tag.
In particular, when using the style sheet, put the body on the same
line as the head for short heads, or on the next line for long
heads, in a way that preserves both correct indentation and correct
vertical spacing with and without -compact, and with one or more
heads per body (hi, Zaphod) - eight use cases so far - and with and
without -tag, and with and without -offset, 32 use cases grand total.
Using many ideas from zhuk@, from <David dot Dahlberg at fkie dot
fraunhofer dot de>, and from Benny Lofgren <bl dash lists at lofgren
dot biz>, and a few of my own.
This is an excellent demonstration that CSS is an extremely hostile
language, much more trapful and much harder to use than, say, C.
When matthew@ reported this in July 2014 (!), it was already a known
issue, and i no longer remember for how long. My first serious
attempt at fixing it (in November 2015) failed miserably. I'd love
to see simplifications of both the generated HTML code and of the
style sheet, but without breaking any of the 32 use cases, please.
Ingo Schwarze [Thu, 19 Jan 2017 01:00:14 +0000 (01:00 +0000)]
Implement line breaking of the generated HTML code at space characters
in filled text. This does not affect HTML semantics, but makes the
HTML code even more humanly readable.
While here,
- collapse multiple consecutive space characters in filled text
- and insert a blank between style entries.
Ingo Schwarze [Wed, 18 Jan 2017 19:22:21 +0000 (19:22 +0000)]
Make HTML output more human readable by overhauling line break logic
around tags and by introducing some simple indentation.
No change of HTML semantics intended.
Ingo Schwarze [Tue, 17 Jan 2017 15:32:43 +0000 (15:32 +0000)]
Completely delete the buf field of struct html and all the buf*()
interfaces. Such a static buffer was a bad idea in the first place,
causing unfixable truncation that was only prevented by triggering
an assertion failure. Instead, let the small number of remaining
users allocate and free their own, temporary dynamic buffers,
or for the case of .Xr and .In, pass the original data to be
assembled in print_otag().
Ingo Schwarze [Sun, 15 Jan 2017 15:28:55 +0000 (15:28 +0000)]
When looking up macro values while the macro tables are being built
in makewhatis(8), use ohash rather than linear searches.
This was identified as the main makewhatis(8) performance bottleneck
by Baptiste Daroussin <bapt at FreeBSD>, who also suggested part
of the improved algorithm.
This reduces the run time of "makewhatis /usr/share/man" from eleven
to five seconds on my notebook. Note that the changed code is not
used in apropos(1), so don't expect speedups there.
While here, sort macro values asciibetically, to improve reproducibility -
which still isn't perfect, but getting better.
Ingo Schwarze [Thu, 12 Jan 2017 18:02:20 +0000 (18:02 +0000)]
Skipping all escape sequences at the beginning of strings in deroff()
was too aggressive. There are strings that legitimately begin with
an escape sequence. Only skip leading escape sequences representing
whitespace.
Ingo Schwarze [Thu, 12 Jan 2017 15:45:05 +0000 (15:45 +0000)]
Put compiler arguments that may contain -l at the end; according to
the people at Alpine Linux, gcc 6 seems to fail when it's at the
beginning. From Daniel Sabogal via http://git.alpinelinux.org.
Ingo Schwarze [Wed, 11 Jan 2017 17:39:53 +0000 (17:39 +0000)]
Do text production for .Bt, .Ex, .Rv, .Ud at the validation stage
rather than in the formatters. Use NODE_NOSRC flag for .Lb and
NODE_NOSRC and NODE_NOPRT for .St. Results in a more rigorous
syntax tree and in 135 lines less code.
This work was triggered by a question from Abhinav Upadhyay <er dot
abhinav dot upadhyay at gmail dot com> (NetBSD) on discuss@.
Ingo Schwarze [Tue, 10 Jan 2017 21:59:47 +0000 (21:59 +0000)]
For the .Ux/.Ox family of macros, do text production at the validation
stage rather than in each and every individual formatter, using the
new NODE_NOSRC flag. More rigorous and also ten lines less code.
Ingo Schwarze [Tue, 10 Jan 2017 12:53:07 +0000 (12:53 +0000)]
Introduce flags NODE_NOSRC and NODE_NOPRT for AST nodes.
Use them to mark generated nodes and nodes that shall not produce output.
Let -Ttree output mode display these new flags.
Use NODE_NOSRC for .Ar, .Mt, and .Pa default arguments.
Use NODE_NOPRT for .Dd, .Dt, and .Os.
These will help to make handling of text production macros more rigorous.
Ingo Schwarze [Mon, 9 Jan 2017 17:49:57 +0000 (17:49 +0000)]
Use stdout rather than stdin for controlling the terminal
such that "cat foo.mdoc | man -l" works.
Issue reported by Christian Neukirchen <chneukirchen at gmail dot com>
and also tested by him on Void Linux with both glibc and musl.
The patch makes sense to millert@.