Decouple the token code for "no request or macro" from the individual
high-level parsers to allow further unification of functions that
only need to recognize this code, but that don't care about different
high-level macrosets beyond that.
Unify node handling functions:
* node_alloc() for mdoc and man_node_alloc() -> roff_node_alloc()
* node_append() for mdoc and man_node_append() -> roff_node_append()
* mdoc_head_alloc() and man_head_alloc() -> roff_head_alloc()
* mdoc_body_alloc() and man_body_alloc() -> roff_body_alloc()
* mdoc_node_unlink() and man_node_unlink() -> roff_node_unlink()
* mdoc_node_free() and man_node_free() -> roff_node_free()
* mdoc_node_delete() and man_node_delete() -> roff_node_delete()
Minus 130 lines of code, no functional change.
Delete the wrapper functions mdoc_meta(), man_meta(), mdoc_node(),
man_node() from the mandoc(3) semi-public interface and the internal
wrapper functions print_mdoc() and print_man() from the HTML formatters.
Minus 60 lines of code, no functional change.
Unify {mdoc,man}_{alloc,reset,free}() into roff_man_{alloc,reset,free}().
Minus 80 lines of code, no functional change.
Written on the train from Koeln to Wolfsburg returning from p2k15.
Move mdoc_hash_init() and man_hash_init() to libmandoc.h
and call them from mparse_alloc() and choose_parser(),
preparing unified allocation of struct roff_man.
Profit from the unified struct roff_man and reduce the number of
arguments of mparse_result() by one. No functional change.
Written on the ICE Bruxelles-Koeln on the way back from p2k15.
Replace the structs mdoc and man by a unified struct roff_man.
Almost completely mechanical, no functional change.
Written on the train from Exeter to London returning from p2k15.
On a new RS nesting level, the saved width starts from the default
width, not from the saved width of the previous level.
Improves xterm(1) and XSetEventQueueOwner(3); found in transcode_filter(1).
Use the default width for .RS without arguments.
Reduces groff-mandoc differences in base and Xenocara by about 4%.
Found while looking at wpa_supplicant(8).
If a partial explicit block extending to the next input line follows
the end macro of a broken block, put all of it into the breaking block.
Needed for example by mutella(1).
Reduce code duplication, no functional change:
Both partial and full implicit blocks can break explicit blocks.
Put the code to handle both cases into a common function.
Arguments to end macros of broken partial explicit blocks
must go inside the breaking block. For example, in
.It Ic cmd Oo
.Ar optional_arg Oc Ar mandatory_arg
the mandatory_arg is still inside the .It block.
Used for example by mutella(1).
Give man(7) section and subsection headers hanging indentation.
Reduces groff-mandoc differences in base by about 2.5% due to
various Perl manuals having long section titles.
Quirk found in argtable2(3).
Rounding rules for horizontal scaling widths are more complicated.
There is a first rounding to basic units on the input side.
After that, rounding rules differ between requests and macros.
Requests round to the nearest possible character position.
Macros round to the next character position to the left.
Implement that by changing the return value of term_hspan()
to basic units and leaving the second scaling and rounding stage
to the formatters instead of doing it in the terminal handler.
Don't allow breaking the output line after hyphens following escape
sequences. Improves tic(1), sxpm(1), and a few Perl manuals.
Quirk found by naddy@ in milter-greylist(8).
Fix a quirk with respect to empty .HP.
Found while writing a regression test for man_macro.c rev. 1.66.
Incidentally, this brings rendering of XFreeEventData(3) closer to groff.
Vastly simplify man(7) block unwinding, similar to mdoc_macro.c 1.171.
Drop one enum type, two static functions, 70 lines of code.
Also fixes the mpeg_encode(1) manual reported broken by naddy@.
It turns out the man(7) parser suffers from unintelligible handling
of block rewinding, just like then mdoc(7) parser did.
First step in getting rid of rew_scope():
Replace the only call where the target block is known.
This commit is analogous to mdoc_macro.c rev. 1.167.
One down, three to go.
No need to hardcode /usr/bin/ as the path to more(1); helps portability.
We don't hardcode the paths to gunzip(1) and cmp(1) either.
Discussed with ajacoutot@.
Third step towards parser unification:
Replace struct mdoc_meta and struct man_meta by a unified struct roff_meta.
Written of the train from London to Exeter on the way to p2k15.
Second step towards parser unification:
Replace struct mdoc_node and struct man_node by a unified struct roff_node.
To be able to use the tok member for both mdoc(7) and man(7) without
defining all the macros in roff.h, sacrifice a tiny bit of type safety
and make tok an int rather than an enum.
Almost mechanical, no functional change.
Written on the Eurostar from Bruxelles to London on the way to p2k15.
First step towards parser unification:
Replace enum mdoc_type and enum man_type by a unified enum roff_type.
Almost mechanical, no functional change.
Written on the ICE train from Frankfurt to Bruxelles on the way to p2k15.
Let man(1) and apropos(1) work even when the current directory
is unusable: Only change back to the current directory when the
directory was changed before and the next path is relative.
This is now more similar to what makewhatis(8) does.
Issue reported by espie@.
Ingo Schwarze [Mon, 30 Mar 2015 16:06:14 +0000 (16:06 +0000)]
Escape punctuation characters that have a different meaning in -Tpdf.
~, `, and ' get translated to non-ASCII characters by most troff
implementations when generating PostScript/PDF output. When the
original ASCII character is meant, it needs to be manually escaped.
Ingo Schwarze [Fri, 27 Mar 2015 16:36:31 +0000 (16:36 +0000)]
Modernize documentation by inserting blanks between option letters
and option arguments, except for -m because "-m an" and "-m andoc"
look just too weird. Of course, the traditional form without the
blank will continue to work.
Ingo Schwarze [Fri, 27 Mar 2015 00:57:28 +0000 (00:57 +0000)]
Document that certain stand-alone accents need escaping in rare cases to
prevent them from being converted to Unicode replacements in PDF output.
Issue found by bentley@, OK jmc@ bentley@.
Ingo Schwarze [Fri, 27 Mar 2015 00:18:14 +0000 (00:18 +0000)]
Add man.conf(5). After adding some additional functionality,
one of the next steps will be to use it in addition to manpath(1)
rather than as an alternative to it.
Ingo Schwarze [Thu, 26 Mar 2015 22:42:32 +0000 (22:42 +0000)]
Add a new directive "manpath path"
to replace the legacy "_whatdb path/whatis.db".
Keep _whatdb support for backward compat, for now.
Discussed with many, jmc@ and ajacoutot@ agree with the general direction.
Ingo Schwarze [Fri, 20 Mar 2015 15:25:12 +0000 (15:25 +0000)]
Patch from Christian Neukirchen <chneukirchen at gmail dot com>:
He reports that on some platforms, it is not possible to use the
same va_list twice. So use va_copy(3) for additional safety.
Ingo Schwarze [Fri, 20 Mar 2015 12:54:22 +0000 (12:54 +0000)]
Simplify by almost halving the number of macro flags:
1. MAN_EXPLICIT was used iff fp == blk_exp, so just test fp.
2. MAN_FSCOPED was used only for TP, so just test for TP.
3. MAN_NOCLOSE was completely unused.
No functional change.
Ingo Schwarze [Thu, 19 Mar 2015 14:57:29 +0000 (14:57 +0000)]
Compat glue needed for Solaris 9 and 10.
Thanks to Sevan Janiyan <venture37 at geeklan dot co dot uk> for
reporting the Solaris 10 issues, to Jan Holzhueter <jh at opencsw
dot org> for some additional insight, and to OpenCSW in general for
providing me with a Solaris 9/10/11 testing environment.
Ingo Schwarze [Wed, 18 Mar 2015 19:29:48 +0000 (19:29 +0000)]
We always use FTS_NOCHDIR, so delete the directory changing code.
This not only simplifies matters, but also helps operating systems
lacking dirfd(3), for example Solaris 10. Solaris dirfd issue
reported by Sevan Janiyan <venture37 at geeklan dot co dot uk>.
Ingo Schwarze [Tue, 17 Mar 2015 07:33:07 +0000 (07:33 +0000)]
When the user exits the pager before the pager has drained all input
from man(1), man(1) dies from SIGPIPE. Exiting man(1) is fine in this
case, generating more output would be pointless, but without handling
SIGPIPE, the exit code from man(1) was wrong and csh(1) printed an
ugly message "Broken pipe". Fix this by handling SIGPIPE explicitly.
Issue noticed by deraadt@.
Ingo Schwarze [Sun, 15 Mar 2015 16:53:41 +0000 (16:53 +0000)]
Avoid off-by-one read access to the termacts array, which could
sometimes result in missing line breaks before subsection headers.
Found by carsten dot kunze at arcor dot de on SuSE 13.2.
Ingo Schwarze [Fri, 13 Mar 2015 20:20:07 +0000 (20:20 +0000)]
Remove the first comma from constructs like ", and," and ", or,":
You can use "and" and "or" to join sentence clauses,
and you can use commas, but both hinders reading;
patch from jmc@.
Ingo Schwarze [Fri, 13 Mar 2015 00:19:41 +0000 (00:19 +0000)]
Fix hardlink detection on platforms having padding in struct inodev,
typically 64bit platforms. This was basically broken since forever.
Not only is the padding used, but it was used uninitialized.
Problem reported by jmc@.
Ingo Schwarze [Wed, 11 Mar 2015 13:15:44 +0000 (13:15 +0000)]
When manpath(1) is available, enable HAVE_MANPATH even when building
without database support. Required now that we have man(1) even
without database support.
Ingo Schwarze [Tue, 10 Mar 2015 13:50:03 +0000 (13:50 +0000)]
We can keep track of the pager PID without additional complexity.
No functional change for now, but more robust in case anybody should
ever add additional child processes.
Ingo Schwarze [Tue, 10 Mar 2015 03:02:28 +0000 (03:02 +0000)]
Fix a regression caused in rev. 1.212, reported by kristaps@:
When using a pager and the first manual shown is gzip'ed,
the gunzip(1) process ended up as a child of the pager process
such that the man(1) process couldn't wait for it, preventing
proper display of the manual.
Solve this by making the pager a child of the man(1) process
(instead of the other way round), which requires being a bit
more careful about properly closing file descriptors after use
and waiting for the pager before exiting man(1).
Ingo Schwarze [Fri, 6 Mar 2015 15:48:52 +0000 (15:48 +0000)]
Fix vertical spacing at the beginning of tables.
man(7) always prints a blank line, mdoc(7) doesn't.
Problem in mdoc(7) reported by kristaps@.
mdoc(7) part of the patch tested by kristaps@.
Ingo Schwarze [Fri, 6 Mar 2015 11:03:03 +0000 (11:03 +0000)]
Flush the line preceding a table before clearing the right margin,
such that that line isn't output with unlimited width.
Problem reported and fix OK by kristaps@.
Ingo Schwarze [Mon, 2 Mar 2015 14:50:17 +0000 (14:50 +0000)]
If a non-gz manual is read after a gzipped manual, refrain
from throwing a bogus error "wait: No child processes".
As reported by Baptiste Daroussin <bapt at FreeBSD dot org>,
clearing the state variable curp->child after use was forgotten.
Ingo Schwarze [Fri, 27 Feb 2015 16:22:09 +0000 (16:22 +0000)]
When makewhatis(8) scans a tree, ignore trailing garbage on filenames.
This is relevant because some ports install files like man1/xsel.1x,
as reported by patrick keshishian <pkeshish at gmail dot com> on misc@.
We can probably improve functionality and simplify the code by ignoring
file name extensions altogether; we already know the section number from
the name of the directory. But so close to lock, i'm keeping the fix
minimal.
Ingo Schwarze [Fri, 27 Feb 2015 16:02:10 +0000 (16:02 +0000)]
When man(1) and apropos(1) look for a file man1/foo.1 but it's unavailable,
fall back to glob(man1/foo.*), which is more like what old man(1) did.
Do this both for file names from the database and for fs_lookup().
This is relevant because some ports install files like man1/xset.1x.
Regression reported by patrick keshishian <pkeshish at gmail dot com>.
Ingo Schwarze [Fri, 20 Feb 2015 23:55:10 +0000 (23:55 +0000)]
For selecting a two-digit font size, support the historic syntax \s12
in addition to the classic syntax \s(12, the modern syntax \s[12],
and the alternative syntax \s'12'. The historic syntax only works
for the font sizes 10-39.
Real-world usage found by naddy@ in plan9/rc.
Ingo Schwarze [Fri, 20 Feb 2015 22:40:38 +0000 (22:40 +0000)]
Completely delete all carriage return characters from the input.
No change to messages about them (ignore them right before line feeds,
report errors elsewhere).
naddy@ found a manual in the wild containing lots of these (ysm(1)),
and i can't imagine a situation where dropping them could be problematic.
Ingo Schwarze [Tue, 17 Feb 2015 20:37:16 +0000 (20:37 +0000)]
Render \(lq and \(rq as '"' in -Tascii mode but leave the rendering
of .Do/.Dc, .Dq, .Lb, and .St untouched.
Reduces groff-mandoc differences in OpenBSD base by about 7%.
Reminded of the issue by naddy@.
Ingo Schwarze [Tue, 17 Feb 2015 18:09:14 +0000 (18:09 +0000)]
Cope with another one of the many kinds of DocBook stupidity:
Instead of just using .br, DocBook sometimes fiddles with the
utterly unportable internal register \n[an-break-flag] that is
only available in the GNU implementation of man(7) and then arms
an input line trap to call the equally unportable internal macro
.an-trap that, in the GNU implementation, inspects that variable;
all the world is GNU, isn't it?
Since naddy@ reports that quite a few ports manuals suffer from
this insanity, let's just translate it to the intended .br.
Ingo Schwarze [Tue, 17 Feb 2015 17:16:52 +0000 (17:16 +0000)]
Let .it accept numerical expressions, not just numerical constants.
For .it, ignore scaling units in roff_getnum().
Inside parentheses, skip whitespace after a sign in roff_getnum().
Parse and ignore unary plus in roff_getnum().
As a bonus, get rid of the only call to mandoc_strntoi() in roff.c.
Ingo Schwarze [Mon, 16 Feb 2015 16:23:54 +0000 (16:23 +0000)]
Delete the -V option. It serves no purpose but keeps confusing people.
Keeping track of the versions of installed software is the job of
the package manager, not of the individual binaries. If individual
binaries include version numbers, that tends to goad people into
writing broken configuration tests that inspect version numbers
instead of properly testing for features.
Ingo Schwarze [Sun, 15 Feb 2015 17:57:45 +0000 (17:57 +0000)]
Tweak the wording to avoid the possible misunderstanding that .In
could only be used in the SYNOPSIS section. It is fine anywhere.
Issue noticed by bentley@.
Ingo Schwarze [Thu, 12 Feb 2015 13:54:50 +0000 (13:54 +0000)]
After almost five years and 99 revisions, mdoc_macro.c rev. 1.182
finally fixed the four issues explained in the mdoc_macro.c rev. 1.83
commit message.
Ingo Schwarze [Thu, 12 Feb 2015 13:00:52 +0000 (13:00 +0000)]
Do not confuse .Bl -column lists that just broken another block
with newly opened .Bl -column lists;
fixing an assertion failure jsg@ found with afl:
test case #481, Bl It Bl -column It Bd El text text El
Ingo Schwarze [Thu, 12 Feb 2015 12:24:33 +0000 (12:24 +0000)]
Delete the mdoc_node.pending pointer and the function calculating
it, make_pending(), which was the most difficult function of the
whole mdoc(7) parser. After almost five years of maintaining this
hellhole, i just noticed the pointer isn't needed after all.
Blocks are always rewound in the reverse order they were opened;
that even holds for broken blocks. Consequently, it is sufficient
to just mark broken blogs with the flag MDOC_BROKEN and breaking
blocks with the flag MDOC_ENDED. When rewinding, instead of iterating
the pending pointers, just iterate from each broken block to its
parents, rewinding all that are MDOC_ENDED and stopping after
processing the first ancestor that it not MDOC_BROKEN. For ENDBODY
markers, use the mdoc_node.body pointer in place of the former
mdoc_node.pending.
This also fixes an assertion failure found by jsg@ with afl,
test case #467 (Bo Bl It Bd Bc It), where (surprise surprise)
the pending pointer got corrupted.
Improved functionality, minus one function, minus one struct field,
minus 50 lines of code.
Ingo Schwarze [Tue, 10 Feb 2015 17:47:45 +0000 (17:47 +0000)]
Be more careful to not generate empty .In, .St, and .Xr nodes.
That could happen when their first argument was another called macro,
causing a NULL pointer access in .St validation found by jsg@ with afl.
Make in_line_argn() easier to understand by using one state
variable rather than two.
Ingo Schwarze [Tue, 10 Feb 2015 11:03:13 +0000 (11:03 +0000)]
Do not read past the end of the buffer if an "f" layout font modifier
is followed by the end of the input line instead of a font specifier.
Found by jsg@ with afl, test case #591.
While here, improve functionality as well:
* There is no "r" font modifier.
* Font specifiers (as opposed to font modifiers) are case sensitive.
* One-character font specifiers require trailing whitespace.
* Ignore parenthised and two-letter font specifiers.
Ingo Schwarze [Sat, 7 Feb 2015 16:42:33 +0000 (16:42 +0000)]
Closing a block validates it, which may end up deleting it,
so if we are in a loop over blocks, cleanly restart the loop
rather than risking use after free; found by jsg@ with afl.