Ingo Schwarze [Sun, 10 Nov 2019 22:35:25 +0000 (22:35 +0000)]
Add a Content-Security-Policy HTTP header that allows only CSS.
This ensures that in a modern browser that understands the header,
mandoc rendering bugs cannot possibly be interpreted as JavaScript.
Patch from bentley@.
Ingo Schwarze [Sat, 9 Nov 2019 14:39:49 +0000 (14:39 +0000)]
In the past, generating comment nodes stopped at the .TH or .Dd
macro, which is usually close to the beginning of the file, right
after the Copyright header comments. But espie@ found horrible
input files in the textproc/fstrcmp port that generate lots of parse
nodes before even getting to the header macro. In some formatters,
comment nodes after some kinds of real content triggered assertions.
So make sure generation of comment nodes stops once real content is
encountered.
Ingo Schwarze [Tue, 1 Oct 2019 17:54:14 +0000 (17:54 +0000)]
For invalid queries and for valid queries returning no result,
return the appropriate 40x status code rather than 200.
Improvement suggested and diff tested
by John Gardner <gardnerjohng at gmail dot com>.
Fix line breaking in no-fill mode (.Bd -unfilled/<pre>),
which apparently didn't work since the .Pp/<p> reorg.
The new logic is more similar to what the terminal formatter does:
1. Before a node that starts a new mdoc(7) input line,
start a new HTML output line.
2. An empty input line or a .Pp causes an empty output line.
3. Nothing needs to be done at the end of a node.
Severe misformatting was reported in table(5) by
Edgar Pettijohn <edgar at pettijohn dash web dot com> on misc@.
Improve validation of function names:
1. Relax checking to accept function types of the form
"ret_type (fname)(args)" (suggested by Yuri Pankov <yuripv dot net>).
2. Tighten checking to require the closing parenthesis.
Do not clear HTML_NOSPACE in print_indent().
I don't think there ever was a reason for doing so.
Besides, there is a discrepacy with respect to the point in the
document affected. That flag controls whitespace at the current
formatting point. But when HTML_BUFFER is in effect, the line break
and indentation is typically inserted one word further to the left.
Anything happening at that point to the left can't reasonably
influence spacing at the different point further to the right.
Among other effects, this change avoids some spurious line breaks
in HTML code at points where they weren't supposed to happen, line
breaks that in some cases caused undesirable, visible whitespace
when the resulting HTML was rendered.
Wrap text and phrasing elements in paragraphs unless already
contained in flow containers; never put them directly into sections.
This helps to format paragraphs with the CSS class selector .Pp.
Suggested by bentley@ and also by Colin Watson <cjwatson at debian>
via Michael Stapelberg <stapelberg at debian>,
see https://github.com/Debian/debiman/issues/116
Format .Nd with more logically with <span> rather than <div>; after all,
it is supposed to be a one-line description. For the case where .Nd
generates flow content (which is very bad style but syntactically
valid), rely on the new feature of html_close_paragraph() to close
out the <span> prematurely, effectively moving the flow content out
of the .Nd for HTML presentation. For the final closing, also rely
on the new html_close_paragraph() functionality, this time triggered
by the subsequent block, which will typically be .Sh SYNOPSIS.
Make html_close_paragraph() more versatile, more robust, less
dependent on individual HTML elements, and simpler: don't just close
<p>, <pre>, and <a>, but any element that establishes phrasing
context. This doesn't change output for any OpenBSD manual page,
but it will allow using this function more safely and at more places
in the future.
Ingo Schwarze [Thu, 29 Aug 2019 17:57:29 +0000 (17:57 +0000)]
In the HTML formatter, assert(3) that no HTML nesting violation occurs.
Tested on the complete manual page trees of Version 7 AT&T UNIX,
4.4BSD-Lite2, POSIX-2013, OpenBSD 2.2 to 6.5 and -current,
FreeBSD 10.0 to 12.0, NetBSD 6.1.5 to 8.1, DragonFly 3.8.2 to 5.6.1,
and Linux 4.05 to 5.02.
Simplification, no functional change:
Delete the "argc" argument from fs_search() which is now always 1,
and move error reporting to the main() program where it is more
logically placed and easier to see.
In man(1) mode, do the search for each name independently, and
show the results in the order of the command line arguments.
Implemented by separating the code for man(1) and apropos(1)
in the main() program.
Surprisingly, the number of lines of code remains unchanged.
Issue reported by deraadt@, additional input from millert@.
Cleanup, no functional change:
For clarity, stop storing the same information (in this case, -O
settings) in two structs. Give the local struct in main.c a more
descriptive name (output state).
Structural cleanup, no functional change:
Mixing parser and formatter state in the same struct was a bad idea,
so pull the parser state and configuration out of it.
This makes sure output options are not passed into parser functions
and parser options are not passed into output functions.
While here, add comments to the important local variables in main().
Structural cleanup, no functional change:
Move process group management out of main() into its own function
because it has its own, self-contained logic and its own local variables.
Slowly start implementing tagging support for man(7) pages, even
though it is obvious that this can never become as good as for
mdoc(7) pages. As a first step, tag alphabetic arguments of .IP
macros, which are often used for lists of options and keywords.
Try "man -O tag=g as" to get the point.
Thanks to Leah Neukirchen for recently reminding me that exploring
how much can be done in this respect may be worthwhile: it is likely
to slightly improve usability while adding only small amounts of
relatively straightforward code.
If no tags were generated at all, unlink(2) the empty tags file as
soon as the condition can be detected and do not pass it to less(1).
This may happen for man(7) pages, for preformatted pages, and for
very simple pages like true(1). The main benefit is that :t inside
less(1) yields the clearer diagnostic message "No tags file" rather
than the mildly confusing "No such tag in tags file": the latter
might encourage further, futile attempts to jump to other tags.
Improvement suggested by Leah Neukirchen <leah at vuxu dot org>
from The Void.
If messages are shown and output is printed without a pager, display
a heads-up on stderr at the end because otherwise, users may easily
miss the messages: because messages typically occur while parsing,
they typically preceed the output. This is most useful with flag
combinations like "-c -W all" but may also help in some unusual
error scenarios.
Inconvenient ordering of output originally pointed out by espie@
for the example situation that /tmp/ is not writeable.
When parsing a tab character that is not preceded by a space character
on an .It -column line, args() sets the MDOC_PHRASEQL flag to Quote
the Last word of the Phrase. Even if it turns out this quoting is not
needed because the word is already quoted for other reasons, clear the
flag at the end of parsing the phrase, such that the flag does not leak
to the next phrase.
This patch fixes the bug that the trailing Macro on a line of the form
.It "word<tab>word" Ta word Macro<eol>
was incorrectly considered quoted and hence not parsed.
Bug found by Havard Eidnes (he@) with the NetBSD gettytab(5) manual page:
https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=54361
Reported via Thomas Klausner (wiz@).
Some time ago, i simplified mandoc_msg() such that it can be used
everywhere and not only in the parsers.
For more uniform messages, use it at more places instead of err(3),
in particular in the main program.
While here, integrate a few trivial functions called at exactly one
place into the main option parser, and let a few more functions use
the normal convention of returning 0 for success and -1 for error.
The non-standard .EX/.EE macro pair was invented for Version 9 AT&T UNIX
and only got adopted by GNU two decades later.
Thanks to Doug McIlroy <doug at cs dot dartmouth dot edu>
for pointing out the error.
delete trailing whitespace and space-tab sequences; no code change;
patch from Michal Nowak <mnowak at startmail dot com>
who found these with git pbchk in the illumos tree
Ingo Schwarze [Thu, 27 Jun 2019 15:07:30 +0000 (15:07 +0000)]
Fix mandoc_normdate() and the way it is used.
In the past, it could return NULL but the calling code wasn't prepared
to handle that. Make sure it always returns an allocated string.
While here, simplify the code by handling the "quick" attribute
inside mandoc_normdate() rather than at multiple callsites.
Triggered by deraadt@ pointing out
that snprintf(3) error handling was incomplete in time2a().
Ingo Schwarze [Thu, 27 Jun 2019 12:20:18 +0000 (12:20 +0000)]
Improve "man -h" output.
1. For pages lacking a SYNOPSIS, show the NAME section rather than nothing.
2. Do not print a stray blank before the beginning of a SYNOPSIS.
Both issues reported by, and patch OK'ed by, tb@.
Ingo Schwarze [Tue, 11 Jun 2019 16:04:36 +0000 (16:04 +0000)]
Do not access a NULL pointer if a table contains a horizontal line
next to a table line having fewer columns than the table as a whole.
Bug found by Stephen Gregoratto <dev at sgregoratto dot me>
with aerc-config(5).
Ingo Schwarze [Mon, 3 Jun 2019 20:23:41 +0000 (20:23 +0000)]
Explicitly state that the cases in the inner switch in term_fill()
are exhaustive. While there is no bug, being explicit has no downside
is is potentially safer for the future.
Michal Nowak <mnowak at startmail dot com> reported that gcc 4.4.4
and 7.4.0 on illumos throw -Wuninitialized false positives.
Ingo Schwarze [Mon, 3 Jun 2019 19:58:02 +0000 (19:58 +0000)]
Initialize the local variable "lastln" in mparse_buf_r().
While there is no bug, it logically makes sense given the meaning
of the variable that lastln is NULL as long as firstln is NULL.
Michal Nowak <mnowak at startmail dot com> reported that gcc 4.4.4
and 7.4.0 on illumos throw -Wuninitialized false positives.
Ingo Schwarze [Mon, 3 Jun 2019 19:50:33 +0000 (19:50 +0000)]
Initialize the local variable "act" in print_mdoc_node().
While there is no bug, it helps clarity, and it is also safer in this
particular code because in case a bug gets introduced later, accessing
a NULL pointer is less dangerous than accessing an uninitialized pointer.
Michal Nowak <mnowak at startmail dot com> reported that gcc 4.4.4
and 7.4.0 on illumos throw -Wuninitialized false positives.
Ingo Schwarze [Tue, 21 May 2019 08:04:21 +0000 (08:04 +0000)]
Do not print the style message "missing date" when the date is given
as "$Mdocdate$" without an actual date. That is the canonical way to
write a new manual page and not bad style at all.
Misleading message reported by kn@ on tech@.
Ingo Schwarze [Fri, 3 May 2019 18:17:12 +0000 (18:17 +0000)]
Enter dangling .so links into the database, to avoid harassing
users of man(1) about running makewhatis(8), which won't help.
Seeing the content of the broken .so request might even help
users to figure out how to access the manual page they want.
Fixing the last issue reported by Lorenzo Beretta <loreb at github>
as part of https://github.com/void-linux/void-packages/issues/9868 .
Ingo Schwarze [Fri, 3 May 2019 17:31:15 +0000 (17:31 +0000)]
In fs_lookup(), use stat(2) rather than access(2) to check file existence.
Some mildly broken real-world packages on some operating systems
contain dangling symlinks in manual page directories: pestering the
user to run makewhatis(8) makes no sense because that won't help.
On the other hand, missing read permissions deserve ugly error messages
and are unlikely to occur in practice anyway.
Fixing an issue reported by Lorenzo Beretta <loreb at github>
as part of https://github.com/void-linux/void-packages/issues/9868 .
Ingo Schwarze [Fri, 3 May 2019 16:14:41 +0000 (16:14 +0000)]
In man(1) mode with a specific section requested,
try harder to find the best match.
Use this order of preference:
1. The section in both the directory name and the file name matches exactly.
2. The section in the file name matches exactly.
3. The section in the directory name matches exactly.
4. Neither of them matches exactly.
The latter can happen when mansearch() finds substring matches
or when the second .Dt argument mismatches the dir and file names.
Lorenzo Beretta <loreb at github> reported that this caused real
problems on Void Linux, like "man 3 readline" showing readline(3m).
See https://github.com/void-linux/void-packages/issues/9868 for details.
Ingo Schwarze [Fri, 3 May 2019 09:39:25 +0000 (09:39 +0000)]
In man(1) mode, when the first argument starts with a digit,
optionally followed by a letter, and at least one more argument
follows, interpret the first argument as a section name even when
additional characters follow after the digit and letter.
This is needed because many operating systems have section names
consisting of a digit followed by more than one letter - for example
Illumos, Solaris, Linux, even NetBSD.
There is very little risk of regressions: in the whole corpus of
manual pages on man.openbsd.org, there isn't a single manual page
name starting with a digit. And even if programs like "0ad" or
"4channels" had manual pages, "man 0ad" and "man -a cat 0ad" would
still work, only "man -a 0ad cat" will fail with "man: No entry for
cat in section 0ad of the manual."
Fixing one of the issues reported by Lorenzo Beretta <loreb at github>
as part of https://github.com/void-linux/void-packages/issues/9868 .
In man(1) mode, i.e. when asking for a single manual page by name,
prefer file name matches over .Dt/.TH matches over first NAME matches
over later NAME matches, but do not change the ordering for apropos(1)
nor for man -a.
This reverts main.c rev. 1.310 and mansearch.h rev. 1.29
and includes a partial revert of mansearch.c rev. 1.79.
Regression reported by Lorenzo Beretta <loreb at github>
as part of https://github.com/void-linux/void-packages/issues/9868 .
In HTML output, allow switching the desired font for subsequent
text without printing an opening tag right away, and use that in
the .ft request handler. While here, garbage collect redundant
enum htmlfont and reduce code duplication in print_text().
Fixing an assertion failure reported by Michael <Stapelberg at Debian>
in pmRegisterDerived(3) from libpcp3-dev.
When calling an empty macro, do not clobber existing arguments.
Fixing a bug found with the groffer(1) version 1.19 manual page
following a report from Jan Stary.
Implement the roff .break request (break out of a .while loop).
Jan Stary <hans at stare dot cz> found it in an ancient groffer(1)
manual page (version 1.19) on MacOS X Mojave.
Having .break not implemented wasn't a particularly bright idea
because obviously, it tended to cause infinite loops.
Ingo Schwarze [Sun, 31 Mar 2019 19:17:26 +0000 (19:17 +0000)]
While we do encourage simplicity in the sense of writing plain '-'
for hyphen-minus, soften the language a bit: writing \- for it is
not wrong, and people started sending us patches to replace \- with '-'
in existing manual pages, which is not a worthwhile change unless
the \- is used at a place where it doesn't belong.
OK jmc@
Ingo Schwarze [Fri, 29 Mar 2019 21:27:06 +0000 (21:27 +0000)]
Set the maximum column index in a tbl(7) to the maximum *right* edge
of any cell span, not to the maximum *left* edge, which may be smaller
if the last column of the table is only reached by horizontal spans,
but not by any regular cell in any row of the table.
Otherwise, the algorithm calculating column widths accessed memomy
after the end of the colwidth[] array, while it was trying to handle
the rightmost column(s).
Crash reported by Jason Thorpe <thorpej at NetBSD>
via https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=54069
and via Thomas Klausner (wiz@).
Christos@ Zoulas sent a (correct, but slightly confusing) patch.
The patch i'm committing here is easier to understand.
Ingo Schwarze [Tue, 19 Mar 2019 16:26:08 +0000 (16:26 +0000)]
When the last line of the input is empty and the previous line reduced
the line input buffer to a length of one byte, do not write one byte
past the end of the line input buffer. Minimal code to show the bug:
printf ".ds X\n.X\n\n" | MALLOC_OPTIONS=C mandoc
Bug found by bentley@ in the sysutils/rancid par(1) manual page.
Ingo Schwarze [Sun, 17 Mar 2019 18:21:45 +0000 (18:21 +0000)]
The header file "html.h" uses enum roff_tok,
so "roff.h" must be included before it.
Diff from bcallah@ tweaked by me;
he found the bug by compiling with pcc.
Ingo Schwarze [Sat, 16 Mar 2019 21:35:48 +0000 (21:35 +0000)]
When drawing a horizontal line in tbl(7) UTF-8 output, it is not
sufficient to look at two data rows, but up to three are needed:
the one above to identify vertical lines branching off upward, the
row itself (in case the line is in a data row rather than a layout
line) to figure out the horizontal line style, and the row below
to identify vertical lines branching off downward.
As an example, bentley@ reported from the mpv(1) manual page that
in a tbl(7) having a vertical line in the middle and a horizontal
line in the bottom data row, the vertical line extended below the
bottom horizontal line.
Ingo Schwarze [Wed, 13 Mar 2019 18:29:18 +0000 (18:29 +0000)]
Contrary to what the NetBSD attribute(3) manual page suggests,
using __dead instead of __attribute__((__noreturn__)) actually
hinders portability rather than helping it.
Given that mandoc already uses __attribute__ in several files
and that in the portable version, ./configure already contains
rudimentary support for ignoring it on platforms that do not
support it, use __attribute__ directly.
This is expected to fix build failures that Stephen Gregoratto
<dev at sgregoratto dot me> reported from Arch and Debian Linux.
Ingo Schwarze [Sun, 10 Mar 2019 09:23:33 +0000 (09:23 +0000)]
Automatically detect whether diff(1) supports the -a option.
Useful on illumos and on Oracle Solaris, where it doesn't.
Patch written based on a report from Sevan Janiyan.
Ingo Schwarze [Wed, 6 Mar 2019 10:18:58 +0000 (10:18 +0000)]
autoconfiguration test whether less(1) supports the -T option;
needed for Alpine Linux because it uses busybox less(1) by default;
based on a patch from Daniel Sabogal explained to me by Natanael Copa
Ingo Schwarze [Mon, 4 Mar 2019 18:15:06 +0000 (18:15 +0000)]
For TIOCGWINSZ, #include <termios.h> rather than <sys/termios.h>
like almost all other userland programs. This also improves
portability: for example, it looks like <sys/termios.h> does not
work on FreeBSD, or at least bapt@ did the same change over there.
Ingo Schwarze [Mon, 4 Mar 2019 13:01:57 +0000 (13:01 +0000)]
When the -S option is given to man(1) and the requested manual page
name is not found and the requested architecture is unknown, complain
about the architecture rather than about the manual page name:
$ man -S vax cpu
man: Unknown architecture "vax".
$ man -S sparc64 foobar
man: No entry for foobar in the manual.
Friendlier error message suggested by jmc@, who also OK'ed the patch.
Ingo Schwarze [Mon, 4 Mar 2019 11:40:09 +0000 (11:40 +0000)]
Fix the last straggler where the struct roff_node "line" member
was abused to detect an input line break;
instead, use the NODE_LINE flag to improve robustness.
Ingo Schwarze [Sun, 3 Mar 2019 13:02:11 +0000 (13:02 +0000)]
Reset HTML formatter state, in particular the id_unique hash,
after processing each manual page, such that the next page
starts from a clean state and doesn't continue suffix numbering.
Issue found while looking at https://github.com/Debian/debiman/issues/48
which was brought up by Orestis Ioannou <oorestisime at github>.
Ingo Schwarze [Sat, 2 Mar 2019 22:04:40 +0000 (22:04 +0000)]
Do not open a subsection for each and every macro.
Instead, use a tagged list and the canonical .Ic macro
as it is natural for such purposes.
While here, also delete heaps of needless escaping.
Ingo Schwarze [Sat, 2 Mar 2019 16:30:53 +0000 (16:30 +0000)]
Represent multiple subsequent .IP blocks having a consistent
head argument of *, \-, or \(bu as <ul> rather than as <dl>,
using a bit of heuristics.
Basic idea suggested by Dagfinn Ilmari Mannsaker <ilmari at github>
in https://github.com/Debian/debiman/issues/67 and independently by
<Pali dot Rohar at gmail dot com> on <discuss at mandoc dot bsd dot lv>.
Ingo Schwarze [Fri, 1 Mar 2019 10:57:17 +0000 (10:57 +0000)]
Wrap .Sh/.SH sections and .Ss/.SS subsections in HTML <section> elements
as recommended for accessibility by the HTML 5 standard.
Triggered by a similar, but slightly different suggestion
from Laura Morales <lauretas at mail dot com>.
Ingo Schwarze [Thu, 28 Feb 2019 16:36:13 +0000 (16:36 +0000)]
Format multiple subsequent .IP or multiple subsequent .TP/.TQ
as a single <dl> list rather than opening a new list for each item;
feature suggested by Pali dot Rohar at gmail dot com.
Ingo Schwarze [Sat, 23 Feb 2019 18:53:54 +0000 (18:53 +0000)]
Explain the ASCII rendering of single quotes because that repeatedly
caused confusion in the past. People plainly do not expect that
there are limits to the compatibility between Unicode and ASCII,
but there are.
The information belongs here and not into mandoc_char(7) because
it explains how the specific output device (-T ascii) works and
because it has nothing to do with the question of how characters
are represented on the input side.
Ingo Schwarze [Sat, 9 Feb 2019 21:02:47 +0000 (21:02 +0000)]
The horizontal line in a data cell containing only "_" or "="
connects to the horizontally adjacent vertical line or cell;
fixing a bug reported by bentley@.