Ingo Schwarze [Thu, 19 Jan 2017 01:00:14 +0000 (01:00 +0000)]
Implement line breaking of the generated HTML code at space characters
in filled text. This does not affect HTML semantics, but makes the
HTML code even more humanly readable.
While here,
- collapse multiple consecutive space characters in filled text
- and insert a blank between style entries.
Ingo Schwarze [Wed, 18 Jan 2017 19:22:21 +0000 (19:22 +0000)]
Make HTML output more human readable by overhauling line break logic
around tags and by introducing some simple indentation.
No change of HTML semantics intended.
Ingo Schwarze [Tue, 17 Jan 2017 15:32:43 +0000 (15:32 +0000)]
Completely delete the buf field of struct html and all the buf*()
interfaces. Such a static buffer was a bad idea in the first place,
causing unfixable truncation that was only prevented by triggering
an assertion failure. Instead, let the small number of remaining
users allocate and free their own, temporary dynamic buffers,
or for the case of .Xr and .In, pass the original data to be
assembled in print_otag().
Ingo Schwarze [Sun, 15 Jan 2017 15:28:55 +0000 (15:28 +0000)]
When looking up macro values while the macro tables are being built
in makewhatis(8), use ohash rather than linear searches.
This was identified as the main makewhatis(8) performance bottleneck
by Baptiste Daroussin <bapt at FreeBSD>, who also suggested part
of the improved algorithm.
This reduces the run time of "makewhatis /usr/share/man" from eleven
to five seconds on my notebook. Note that the changed code is not
used in apropos(1), so don't expect speedups there.
While here, sort macro values asciibetically, to improve reproducibility -
which still isn't perfect, but getting better.
Ingo Schwarze [Thu, 12 Jan 2017 18:02:20 +0000 (18:02 +0000)]
Skipping all escape sequences at the beginning of strings in deroff()
was too aggressive. There are strings that legitimately begin with
an escape sequence. Only skip leading escape sequences representing
whitespace.
Ingo Schwarze [Thu, 12 Jan 2017 15:45:05 +0000 (15:45 +0000)]
Put compiler arguments that may contain -l at the end; according to
the people at Alpine Linux, gcc 6 seems to fail when it's at the
beginning. From Daniel Sabogal via http://git.alpinelinux.org.
Ingo Schwarze [Wed, 11 Jan 2017 17:39:53 +0000 (17:39 +0000)]
Do text production for .Bt, .Ex, .Rv, .Ud at the validation stage
rather than in the formatters. Use NODE_NOSRC flag for .Lb and
NODE_NOSRC and NODE_NOPRT for .St. Results in a more rigorous
syntax tree and in 135 lines less code.
This work was triggered by a question from Abhinav Upadhyay <er dot
abhinav dot upadhyay at gmail dot com> (NetBSD) on discuss@.
Ingo Schwarze [Tue, 10 Jan 2017 21:59:47 +0000 (21:59 +0000)]
For the .Ux/.Ox family of macros, do text production at the validation
stage rather than in each and every individual formatter, using the
new NODE_NOSRC flag. More rigorous and also ten lines less code.
Ingo Schwarze [Tue, 10 Jan 2017 12:53:07 +0000 (12:53 +0000)]
Introduce flags NODE_NOSRC and NODE_NOPRT for AST nodes.
Use them to mark generated nodes and nodes that shall not produce output.
Let -Ttree output mode display these new flags.
Use NODE_NOSRC for .Ar, .Mt, and .Pa default arguments.
Use NODE_NOPRT for .Dd, .Dt, and .Os.
These will help to make handling of text production macros more rigorous.
Ingo Schwarze [Mon, 9 Jan 2017 17:49:57 +0000 (17:49 +0000)]
Use stdout rather than stdin for controlling the terminal
such that "cat foo.mdoc | man -l" works.
Issue reported by Christian Neukirchen <chneukirchen at gmail dot com>
and also tested by him on Void Linux with both glibc and musl.
The patch makes sense to millert@.
Ingo Schwarze [Mon, 9 Jan 2017 12:48:58 +0000 (12:48 +0000)]
The .No macro is not supposed to produce fixed-width font, it is not
the same as .Li, so don't use <code>.
Bug reported by <Anton dot Lindqvist at gmail dot com> on tech@.
Ingo Schwarze [Mon, 9 Jan 2017 01:37:03 +0000 (01:37 +0000)]
Warnings and errors that occur during mdoc_validate()
or during man_validate() have to affect the mandoc(1) EXIT STATUS.
Many thanks to <Yuri dot Pankov at gmail dot com> (illumos developer)
for reporting this regression.
Ingo Schwarze [Sun, 8 Jan 2017 22:51:55 +0000 (22:51 +0000)]
Indentation must be measured in units of the surrounding text,
not in units of the contained text. Consequently, "display"
and "lit" class tags must not be on the same element: First,
"display" must set up the indentation, still using the outer
units, and only after that, "lit" may change the font.
This fixes .Bd -literal which got the wrong indentation.
Bug reported by tb@.
Ingo Schwarze [Sun, 8 Jan 2017 02:01:17 +0000 (02:01 +0000)]
Tolerate bare tabs in SYNOPSIS .Cd for now.
It's used in half a dozen pages.
Even though i have been thinking about it for years,
i still can't suggest anything better.
The false positives are annoying.
Ingo Schwarze [Sun, 8 Jan 2017 00:11:23 +0000 (00:11 +0000)]
Stricter validation of the NAME section, in particular:
- require a comma between names
- reject all other text nodes
- reject all empty Nm below NAME, not only in the leading position
- reject Nm after Nd
Ingo Schwarze [Wed, 28 Dec 2016 17:34:18 +0000 (17:34 +0000)]
Make the second, section number argument of .Xr mandatory.
In fact, we have been requiring it for many years.
The only reason to not warn when it was missing
was excessive traditionalism - it was optional in 4.4BSD.
Ingo Schwarze [Wed, 7 Dec 2016 22:59:29 +0000 (22:59 +0000)]
When reporting "whitespace at end of input line" on lines ending with
roff(7) comments, let the column number in the message point to the
end of the line rather than to the beginning of the comment.
Improvement suggested by bluhm@.
Ingo Schwarze [Sat, 19 Nov 2016 15:24:51 +0000 (15:24 +0000)]
Do not install libmandoc.a by default.
The only environment where it is ever needed is NetBSD base.
Even NetBSD ports and pkgsrc should better not install it.
Triggered by a question from bentley@.
Ingo Schwarze [Tue, 8 Nov 2016 16:23:58 +0000 (16:23 +0000)]
implement tag priority 0, which will tag only keys that appear as
tag candidates exactly once, and use it for .Em and .Sy;
written on the TGV Toulouse-Paris
Ingo Schwarze [Tue, 18 Oct 2016 22:27:25 +0000 (22:27 +0000)]
The termination condition of the iteration logic in page_bymacro()
was overzealous. Consequently, macro=substr and macro~regexp searches
only returned all pages containing the first matching macro value,
rather than all pages containing any of the matching macro values.
Bug reported by tb@ - thanks!
Ingo Schwarze [Tue, 18 Oct 2016 16:06:44 +0000 (16:06 +0000)]
Compat glue for the FreeBSD comparison function prototype for fts_open(3)
which differs from what most other systems use.
While here, improve diagnostic output of ./configure tests.
Ingo Schwarze [Tue, 18 Oct 2016 14:15:33 +0000 (14:15 +0000)]
Simplify and correct support for reproducible builds, such that database
entries come in a well-defined order even in the presence of MLINKS.
Do this by using the compar() argument of fts_open(3) rather than
trying to sort later, which missed some cases.
This also shortens the code by a few lines.
Diff from Ed Maste <emaste @ FreeBSD>, adapted to our tree
and tweaked a bit by me, final version confirmed by Ed.
Ingo Schwarze [Sun, 9 Oct 2016 18:16:56 +0000 (18:16 +0000)]
Delete complicated code dealing with .Bl -tag without -width,
and just let it default to -width 6n, which agrees with the
traditional -width Ds that is still in widespread use.
I just pushed a patch upstream to GNU roff that does the same for
groff_mdoc(7). Before, groff contained code that was even more
complicated than mandoc, but both resulted in quite different
user-visible output. Now, both agree, and output is nicer for both.
Useless complication noticed by Carsten Kunze (Heirloom roff).
We cannot use fputs(3) in passthrough() because the stdout stream
might be in stdio wide orientation due to prior formatting of an
unformatted manual in man -aTutf8 mode. So for now, use fflush(3)
followed by unbuffered write(2) instead. Fixes output corruption
on glibc discovered on Linux while testing a diff to fix a loosely
related bug reported by <jmates at ee dot washington dot edu>.
I detest the concept of stdio stream orientation. One day, i will
rewrite term_ascii.c to always use narrow streams, even in UTF-8
output mode. But that's too much work for today.
Make sure an output device is allocated before calling terminal_sepline(),
fixing a NULL pointer access that happened when the first of multiple pages
shown was preformatted, as in "man -a groff troff".
Crash reported by <jmates at ee dot washington dot edu> on bugs@, thanks!
When "makewhatis -d" tries to add to a database that doesn't (yet) exist,
silently create it from scratch instead of printing a warning.
The annoying warning message was reported by ajacoutot@, and espie@
convincingly argues that a non-existing database can be considered
equivalent to an empty one.
Ingo Schwarze [Tue, 30 Aug 2016 22:01:07 +0000 (22:01 +0000)]
When the database is corrupt in the sense of containing invalid
pointers in the pages table, do not access NULL pointers, but
gracefully handle the errors.
Similar patches will be needed for the macro tables, too.
<attila at stalphonsos dot com> audited the code and pointed out to me
that dbm_get() can return NULL for corrupted databases, but that isn't
handled properly at various places.
Ingo Schwarze [Sun, 28 Aug 2016 16:15:12 +0000 (16:15 +0000)]
If a line inside .Bl -column starts with a tab character
and there was no preceding .It macro, do not read the byte
before the beginning of the line buffer.
Found by tb@ with afl@.
Ingo Schwarze [Mon, 22 Aug 2016 16:15:26 +0000 (16:15 +0000)]
When trying to edit an existing database with makewhatis(8) -d or -u
but reading the database fails, report the full path to the database
on standard error, and mention that the database is automatically
recreated from scratch.
Suggested by espie@.
Ingo Schwarze [Mon, 22 Aug 2016 16:07:16 +0000 (16:07 +0000)]
When running into a mandoc.db(5) file still using the obsolete
format based on SQLite 3, say so in words that mortals can
understand rather than babbling about hex magic.
Suggested by espie@.
Ingo Schwarze [Sat, 20 Aug 2016 17:59:34 +0000 (17:59 +0000)]
When a mismatching end macro occurs while at least two nested blocks
are open, all except the innermost open block got a bogus MDOC_ENDED
marker, in some situations triggering segfaults down the road
which tb@ found with afl(1).
Fix the logic error by figuring out up front whether an end macro
has a matching body, and if it hasn't, don't mark any blocks as broken.
Ingo Schwarze [Sat, 20 Aug 2016 15:58:21 +0000 (15:58 +0000)]
When scanning upwards for a column list to put a .Ta macro in,
ignore body end markers of lists breaking other blocks.
Fixing a logical error that caused a NULL deref found by tb@ with afl(1).
Ingo Schwarze [Sat, 20 Aug 2016 14:43:50 +0000 (14:43 +0000)]
If a column list starts with implicit rows (that is, rows without .It)
and roff-level nodes (e.g. tbl or eqn) follow, don't run into an
assertion. Instead, wrap the roff-level nodes in their own row.
Issue found by tb@ with afl(1).
Ingo Schwarze [Wed, 17 Aug 2016 20:46:56 +0000 (20:46 +0000)]
When the content of a manual page does not specify a section, the
empty string got added to the list of sections, breaking the database
format slightly and causing the page to not be considered part of
any section, not even if a section could be deduced from the directory
or from the file name.
Bug found due to the bogus pcredemo(3) "manual" in the pcre-8.38p0 package.
Ingo Schwarze [Wed, 17 Aug 2016 18:59:37 +0000 (18:59 +0000)]
When reading back a mandoc.db(5) file in order to apply incremental
changes, do not prepend a stray NAME_FILE (0x10) byte to the first
names of pages.
Bug found while investigating another issue reported by sthen@.
Ingo Schwarze [Wed, 17 Aug 2016 18:10:39 +0000 (18:10 +0000)]
Make sure manuals in architecture-independent directories are treated
as architecture-independent even if they abuse the third (architecture)
argument of the .Dt macro for random stuff like "freetds reference manual".
While the .Dt syntax is not the same as the .TH syntax in man(7),
punishing offenders by treating them as architecture-dependent and
hence completely excluding them from searches is too severe.
Problem reported by sthen@.
Ingo Schwarze [Thu, 11 Aug 2016 13:30:25 +0000 (13:30 +0000)]
Even after switching from a pending head to the body, we have to
continue scanning upwards, because the enclosing block might already
be pending as well, e.g. .Bl .Bl .It Bo .El .It.
Tree corruption leading to a later NULL deref found by tb@ with afl(1).
Ingo Schwarze [Thu, 11 Aug 2016 10:47:16 +0000 (10:47 +0000)]
If a .Bd display is on the one hand doomed to be deleted because
it has no type, but is on the other hand breaking another block,
delete its end marker as well, or the end marker may remain behind
as an orphan, triggering an assertion in the terminal formatter.
Problem found by tb@ with afl(1).
Ingo Schwarze [Wed, 10 Aug 2016 20:17:50 +0000 (20:17 +0000)]
Don't deref NULL if the only child of the first .Sh is an empty
in-line macro, and don't printf("%s", NULL) if the first child
of the first .Sh is a macro; again found by tb@ with afl(1).
(No, you should never use macros in any .Sh at all, please.)
Ingo Schwarze [Wed, 10 Aug 2016 12:50:24 +0000 (12:50 +0000)]
When trying to figure out which C compiler make(1) wants to use,
pass it the POSIX -s option. On most systems, this won't make a
difference, but Bdale Garbee reported that the make(1) on his Debian
system, most likely some version of gmake, breaks Makefile.local
by printing some 'entering directory' messages. I failed to reproduce
and Bdale didn't report back, but judging from gmake source code,
this is likely to help and unlikely to do harm elsewhere.
Ingo Schwarze [Wed, 10 Aug 2016 12:06:41 +0000 (12:06 +0000)]
When validating a .Bl list that defaults to -item for want of a type,
don't let a subsequent -width access mdoc_argnames[] out of bounds.
Found by tb@ with afl(1).
Ingo Schwarze [Wed, 10 Aug 2016 11:03:43 +0000 (11:03 +0000)]
Fix assertion failures caused by whitespace inside \o'' (overstrike)
sequences that jsg@ found with afl(1):
* Avoid writing \t\b in term.c.
* Handle trailing \b in term_ps.c.
Ingo Schwarze [Fri, 5 Aug 2016 23:15:08 +0000 (23:15 +0000)]
The concept of endianness seems to be somewhat newfangled, so the
respective conversion functions are not yet properly standardized.
Rumour has it that POSIX is working on it, though.
For now, sprinkle some configuration glue.
Ingo Schwarze [Thu, 4 Aug 2016 09:33:57 +0000 (09:33 +0000)]
Fix an assertion failure that happened when trying to add a page
with makewhatis -d to a completely empty database.
Reported by Mark Patruck <mark at wrapped dot cx>, thanks!
Ingo Schwarze [Tue, 2 Aug 2016 11:09:46 +0000 (11:09 +0000)]
POSIX allows PATH_MAX to not be defined, meaning "unlimited".
Found by Aaron M. Ucko <amu at alum dot mit dot edu> on the GNU Hurd,
via Bdale Garbee, https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=829624
Also add EFTYPE at two places where it was forgotten.
Some base system pages, for example perl(1), contain non-ASCII
characters in their source code, so switch on charset autodetection
in the same way as in man(1) itself.
Issue reported by Pavan Maddamsetti at gmail dot com on bugs@.
Autodetect a suitable locale for -Tutf8 mode,
and allow overriding it manually.
Based on a patch from Svyatoslav Mishyn <juef at openmailbox dot org>
tweaked by me.
The idea originally came up in a conversation with Markus Waldeck.
No need to populate the TYPE_arch and TYPE_sec bits, the information
is provided directly to dba_page_add() in dbadd_mlink()
and to dba_page_new() in dbadd().
No need for a dedicated loop for NAME_FILE.
It's done in dbadd_mlink() anyway.
In this context, also record section numbers taken from filenames
and from .Dt and .TH macros, architectures taken from .Dt macros,
and fix the filtering of duplicate filename entries.