Ingo Schwarze [Fri, 8 Jan 2016 15:02:54 +0000 (15:02 +0000)]
Prefer warn(3) over perror(3) at the few places where it was used.
It is useful to see the program name, and we have err.h compat in place anyway.
Suggested by Christos Zoulas (NetBSD).
Ingo Schwarze [Fri, 8 Jan 2016 02:53:13 +0000 (02:53 +0000)]
Simplify the mparse_open() interface.
Just return the file descriptor or -1 on error;
there is just one kind of error anyway.
Suggested by Christos Zoulas (NetBSD).
Ingo Schwarze [Fri, 8 Jan 2016 02:13:39 +0000 (02:13 +0000)]
It was very surprising that a function called mparse_readfd()
closed the file descriptor passed to it after completing its work,
in particular considering the fact that it required its callers
to call open(2) or mparse_open() beforehand.
Change mparse_readfd() to not call close(2) and change the callers
to call close(2) afterwards, more or less bringing open and close
to the same level of the code and making review easier. Note that
man.cgi(8) already did that, even though it was wrong in the past.
Small restructuring suggested by Christos Zoulas (NetBSD).
Ingo Schwarze [Fri, 8 Jan 2016 00:50:45 +0000 (00:50 +0000)]
The root of an .EQ tree is always EQN_ROOT, never EQN_LIST,
so delete a redundant NULL check that confused Coverity in CID 1257471;
issue reported by wiz@, patch differs from what christos@ did in NetBSD.
No functional change.
Ingo Schwarze [Thu, 7 Jan 2016 21:03:54 +0000 (21:03 +0000)]
This code wasted memory by allocating sizeof(enum termfont *)
where only sizeof(enum termfont) is needed.
Fixes CID 1288941. From christos@ via wiz@, both at NetBSD.
Ingo Schwarze [Thu, 7 Jan 2016 20:19:01 +0000 (20:19 +0000)]
Recursive "define" was not detected because "lim" was never
incremented, causing infinite loops.
Fixing CID 1288962. From christos@ via wiz@, both at NetBSD.
Ingo Schwarze [Mon, 4 Jan 2016 14:44:57 +0000 (14:44 +0000)]
Improve handling of .Va and .Vt macros.
tedu@ noticed that no Vt= database entries were generated.
Serguey Parkhomovsky suggested the deletion of parse_mdoc_body().
tb@ noticed that the fix requires more than just adding TYPE_Vt
to the MDOC_Vt mask in the mdoc_handler array.
Ingo Schwarze [Tue, 15 Dec 2015 17:38:45 +0000 (17:38 +0000)]
pledge(2) style:
Make sure to always use the idiom 'if (pledge("'
such that it can easily be searched for.
No functional change.
Requested by deraadt@ some time ago.
Ingo Schwarze [Thu, 26 Nov 2015 07:42:11 +0000 (07:42 +0000)]
No point in trying to go on when elementary database operations
like preparing queries or binding variables fail; that won't yield
useful results anyway but may generate huge pointless error messages.
Issue reported by deraadt@.
Ingo Schwarze [Fri, 20 Nov 2015 21:59:54 +0000 (21:59 +0000)]
Fix multiple issues regarding process group and signal mask handling
found by tb@ and millert@; parts of the code, in particular in tag.c,
by millert@; OK millert@.
Ingo Schwarze [Sat, 14 Nov 2015 23:57:47 +0000 (23:57 +0000)]
Fix an issue reported by deraadt@: When hitting Ctrl-Backslash (= SIGQUIT)
in the less(1) spawned by man(1), man(1) died uncleanly, leaving behind
its temp files, and killed less(1) uncleanly as well with SIGPIPE,
leaving the terminal in the wrong state.
Fix this by giving less(1) its own process group and handing it
control of the terminal, but in such a way that Ctrl-z (= SIGSTOP)
still works: In that case, let man(1) stop itself, too, and let it
continue the pager when it continues itself.
Joint work with millert@ who contributed most of the expertise
required, and also most parts of the code.
OK deraadt@ millert@
Ingo Schwarze [Thu, 12 Nov 2015 22:44:27 +0000 (22:44 +0000)]
Simplify the logic in mandoc_normdate() and add some comments.
Also add a comment in time2a() explaining why it isn't possible
to use just one single call to strftime().
Do some style cleanup while here.
No functional change.
Triggered by a very different patch from des@FreeBSD.
Ingo Schwarze [Thu, 12 Nov 2015 21:50:03 +0000 (21:50 +0000)]
Never use LC_ALL. On the one hand, it can cause misformatting.
On the other hand, it is a security risk because it might cause
buffer overflows. Use LC_CTYPE only, that's all we need.
Ingo Schwarze [Sat, 7 Nov 2015 17:58:55 +0000 (17:58 +0000)]
Modernization, no functional change intended:
Use the POSIX function getline(3) rather than the slightly
dangerous BSD function fgetln(3).
Remove the related compatibility code.
Ingo Schwarze [Sat, 7 Nov 2015 14:22:29 +0000 (14:22 +0000)]
Without HAVE_ERR, don't try to include <err.h>, it probably isn't there.
In that case, the required prototypes are in "config.h".
Patch from Peter Bray <pdb_ml at yahoo dot com dot au>.
Ingo Schwarze [Sat, 7 Nov 2015 14:01:16 +0000 (14:01 +0000)]
In private header files, __BEGIN_DECLS and __END_DECLS are pointless.
Because these work slightly differently on different systems,
they are becoming a maintenance burden in the portable version,
so delete them.
Besides, one of the chief design goals of the mandoc toolbox is to
make sure that nothing related to documentation requires C++.
Consequently, linking mandoc against any kind of C++ program would
defeat the purpose and is not supported.
I don't understand why kristaps@ added them in the first place.
Ingo Schwarze [Fri, 6 Nov 2015 16:30:33 +0000 (16:30 +0000)]
Use getprogname(3) rather than __progname.
Suggested by Joerg@ Sonnenberger (NetBSD).
Last year, deraadt@ confirmed on tech@ that this "has the potential
to be more portable", and micro-optimizing for speed is not relevant
here. Also gets rid of one global variable.
Ingo Schwarze [Thu, 5 Nov 2015 20:55:41 +0000 (20:55 +0000)]
Use include files "header.html" and "footer.html" rather than a
compiled-in string. This is not a security risk, we read the file
manpath.conf from the same directory, anyway. No error handling
is needed; even if the files are absent, that's not an error.
This is more flexible without causing complication of the code or
the user interface. It helps the upcoming revamp of the online
manual pages on man.NetBSD.org.
Based on an idea by Jean-Yves Migeon <jeanyves dot migeon at free dot fr>,
but implemented in a much simpler way.
Ingo Schwarze [Fri, 30 Oct 2015 19:04:16 +0000 (19:04 +0000)]
If a .Bd block has no arguments at all, drop the block and only keep
its contents. Removing a gratuitious difference to groff output
found after a related bug report from krw@.
Ingo Schwarze [Thu, 22 Oct 2015 21:03:43 +0000 (21:03 +0000)]
If no output device was allocated because no file wanted to produce output,
refrain from dereferencing a NULL pointer during final deallocation.
Fixing a recent regression reported by czarkoff@
Ingo Schwarze [Wed, 21 Oct 2015 23:51:11 +0000 (23:51 +0000)]
Move all mdoc(7) node validation done before child parsing
to the new separate validation pass, except for a tiny bit
needed by the parser which goes to the new mdoc_state() module;
cleaner, simpler, and surprisingly also shorter by 15 lines.
Ingo Schwarze [Tue, 20 Oct 2015 02:01:31 +0000 (02:01 +0000)]
In order to become able to generate syntax tree nodes on the roff(7)
level, validation must be separated from parsing and rewinding.
This first big step moves calling of the mdoc(7) post_*() functions
out of the parser loop into their own mdoc_validate() pass, while
using a new mdoc_state() module to make syntax tree state handling
available to both the parser loop and the validation pass.
Ingo Schwarze [Sat, 17 Oct 2015 00:21:07 +0000 (00:21 +0000)]
Very tricky diff to fix macro interpretation and spacing around tabs
in .Bl -column; it took me more than a day to get this right.
Triggered by a loosely related bug report from tim@.
The lesson for you is: Use .Ta macros in .Bl -column, avoid tabs,
or you are in for surprises: The last word before a tab is not
interpreted as a macro (unless there is a blank in between), the
first word after a tab isn't either (unless there is a blank in
between), and a blank after a tab causes a leading blank in the
respective output cell. Yes, "blank", "tab", "blank tab" and "tab
blank" all have different semantics; if you write code relying on
that, good luck maintaining it afterwards...
Ingo Schwarze [Thu, 15 Oct 2015 22:45:43 +0000 (22:45 +0000)]
Simplify the part of args() that is handling .Bl -column phrases:
Delete manual "Ta" handling because macro handling should
not be done in an argument parser but should be left to the
macro parsers, which exist anyway and work well.
No functional change, minus 40 lines of code.
Confusing and redundant code found while investigating
an old bug report from tim@.
Ingo Schwarze [Thu, 15 Oct 2015 22:27:24 +0000 (22:27 +0000)]
When blk_full() handles an .It line in .Bl -column and indirectly
calls phrase_ta() to handle a .Ta child macro, advance the body
pointer accordingly, such that a subsequent tab character rewinds
the right body block and doesn't fail an assertion. That happened
when there was nothing between the .Ta and the tab character.
Bug reported by tim@ some time ago.
Ingo Schwarze [Tue, 13 Oct 2015 23:30:50 +0000 (23:30 +0000)]
Reject the escape sequences \[uD800] to \[uDFFF] in the parser.
These surrogates are not valid Unicode codepoints,
so treat them just like any other undefined character escapes:
Warn about them and do not produce output.
Issue noticed while talking to stsp@, semarie@, and bentley@.
Ingo Schwarze [Tue, 13 Oct 2015 22:59:54 +0000 (22:59 +0000)]
Major character table cleanup:
* Use ohash(3) rather than a hand-rolled hash table.
* Make the character table static in the chars.c module:
There is no need to pass a pointer around, we most certainly
never want to use two different character tables concurrently.
* No need to keep the characters in a separate file chars.in;
that merely encourages downstream porters to mess with them.
* Sort the characters to agree with the mandoc_chars(7) manual page.
* Specify Unicode codepoints in hex, not decimal (that's the detail
that originally triggered this patch).
No functional change, minus 100 LOC, and i don't see a performance change.
Ingo Schwarze [Tue, 13 Oct 2015 15:53:05 +0000 (15:53 +0000)]
Reduce the amount of code by moving the three copies of the ohash
callback functions into one common place, preparing for the use of
ohash for some additional purposes. No functional change.
Ingo Schwarze [Mon, 12 Oct 2015 21:26:02 +0000 (21:26 +0000)]
Delete an assignment that is unconditionally overwritten two lines later;
found by Svyatoslav Mishyn <juef at openmailbox dot org>
with the clang static analyzer.
Ingo Schwarze [Mon, 12 Oct 2015 21:09:54 +0000 (21:09 +0000)]
Check the right pointer against NULL;
fixing a pasto introduced in the previous commit;
found by Svyatoslav Mishyn <juef at openmailbox dot org> with cppcheck.
Ingo Schwarze [Mon, 12 Oct 2015 15:29:35 +0000 (15:29 +0000)]
Use "-" rather than "\(hy" for the heads of .Bl -dash and -hyphen lists.
In UTF-8 output, that renders as ASCII HYPHEN-MINUS (U+002D)
rather than HYPHEN (U+2010), which looks better and matches groff.
In ASCII output, it makes no difference.
Suggested by naddy@.
Ingo Schwarze [Mon, 12 Oct 2015 00:32:55 +0000 (00:32 +0000)]
Clear dform and dsec when exiting a first-level directory in treescan().
Fixes a segfault reported by bentley@.
While here, do some style cleanup in the same function.
Ingo Schwarze [Mon, 12 Oct 2015 00:08:15 +0000 (00:08 +0000)]
To make the code more readable, delete 283 /* FALLTHROUGH */ comments
that were right between two adjacent case statement. Keep only
those 24 where the first case actually executes some code before
falling through to the next case.
Ingo Schwarze [Sun, 11 Oct 2015 22:00:52 +0000 (22:00 +0000)]
Drop tags containing a blank character:
They don't work, they break other tags in weird ways, and even
if they could be made to work, they would be mostly useless.
Issue reported by naddy@, thanks.
Ingo Schwarze [Sun, 11 Oct 2015 21:12:54 +0000 (21:12 +0000)]
Finally use __progname, err(3) and warn(3).
That's more readable and less error-prone than fumbling around
with argv[0], fprintf(3), strerror(3), perror(3), and exit(3).
It's a bad idea to boycott good interfaces merely because standards
committees ignore them. Instead, let's provide compatibility modules
for archaic systems (like commercial Solaris) that still don't have
them. The compat module has an UCB Copyright (c) 1993...
Fix multiple aspects of SYNOPSIS .Nm formatting:
* Don't break lines before non-block .Nm elements.
* Use proper <b> markup for the heads of .Nm blocks.
* Make the width measurements work by doing them on the head children.
Trailing whitespace is significant when determining the width of a tag
in mdoc(7) .Bl -tag and man(7) .TP, but not in man(7) .IP.
Quirk reported by Jan Stary <hans at stare dot cz> on ports@.
Remove the warning about children of .Vt blocks because actually,
.Vt type global_variable No = Dv defined_constant ;
is the best way to specify in the SYNOPSIS how a global variable
is initialized in the rare case where that matters.
Issue noticed by jmc@.
Fill mode changes don't break next-line scope in all cases,
in particular not for tagged paragraphs.
Issue found by Christian Neukirchen <chneukirchen at gmail dot com>
in the exiv2(1) manual page.
Recommend an unambiguous escape for minus signs instead of \-.
Historically, \- was used in troff for three cases: flags/pathnames,
en dashes, and minus signs. mandoc_char(7) currently recommends it
for minus signs, recommends \(en for en dashes, and doesn't mention
flags/pathnames.
In the old days, nroff rendered \- as ASCII '-', and troff rendered
it as en dash/minus (which were visually indistinguishable).
In Unicode, en dashes and minus signs are semantically distinct and
encoded differently (U+2013 for en dash, U+2212 for minus), and
often rendered differently too. Meanwhile ASCII '-' has been renamed
"hyphen-minus" and fonts typically render it closest to a hyphen, not
a minus.
There is very little consistency across roff implementations and output
formats for what Unicode character \- corresponds to. So at least for
minus signs, change the recommendation to the unambiguous \(mi escape.
from bentley@, ok jmc@ (after reams of discussion)
Ingo Schwarze [Sun, 30 Aug 2015 21:10:56 +0000 (21:10 +0000)]
Drop leading, internal, and trailing blank characters in \o (overstrike)
escape sequences; that's cleaner for all output modes, and it's required
to prevent the PostScript/PDF formatter from dying on assertions.
Bug found by jsg@ with afl.
Ingo Schwarze [Sat, 29 Aug 2015 23:56:01 +0000 (23:56 +0000)]
If we have to reparse the text line because we spring an input line trap,
we must not escape breakable hyphens yet, or mparse_buf_r() in read.c
will complain and replace the escaped hyphens with question marks.
Bug found in ocserv(8) following a report from Kurt Jaeger <pi at FreeBSD>.
Ingo Schwarze [Sat, 29 Aug 2015 22:40:05 +0000 (22:40 +0000)]
Parse and ignore the escape sequences \, and \/ (italic corrections).
Actually using these is very stupid because they are groff extensions
and other roff(7) implementations typically print unintended characters
at the places where they are used.
Nevertheless, some manuals contain them, for example ocserv(8).
Problem reported by Kurt Jaeger <pi at FreeBSD>.
Ingo Schwarze [Sat, 29 Aug 2015 21:37:20 +0000 (21:37 +0000)]
Implement the escape sequence \\$*, expanding to all arguments
of the current user-defined macro.
This is another missing feature required for ocserv(8).
Problem reported by Kurt Jaeger <pi at FreeBSD>.
Ingo Schwarze [Sat, 29 Aug 2015 20:26:04 +0000 (20:26 +0000)]
Minimal implementation of the read-only number register \n(.$
which returns the number of arguments of the current macro.
This is one of the missing features required for ocserv(8).
Problem reported by Kurt Jaeger <pi at FreeBSD>.
Ingo Schwarze [Sat, 29 Aug 2015 15:28:13 +0000 (15:28 +0000)]
Including <ohash.h> requires including <stdint.h> before,
and "config.h" was missing as well.
Patch from Svyatoslav Mishyn <juef and openmailbox dot org>, Crux Linux.
Remove the hack of scrolling forward and backward with +G1G that
many (jmc@, millert@, espie@, deraadt@) considered revolting.
Instead, when using a pager, since we are using a temporary file
for tags anyway, use another temporary file for the formatted
page(s), as suggested by millert@ and similar to what the traditional
BSD man(1) did, except that we use only one single temporary output
file rather than one for each formatted manual page, such that
searching (both with / and :t) works across all the displayed files.
Simplify and make tag_put() more efficient by integrating tag_get()
into it and by only handling NUL-terminated strings.
Minus 25 lines of code, no functional change.
When creation of the temporary tags file fails, call the pager
without the -T option, because otherwise the pager won't even start.
Fixing a bug reported by jca@.
While here, shorten the code by two lines
and delete one internal interface function.
Do not fork and exec gunzip(1), just link with libz instead.
As discussed with deraadt@, that's cleaner and will help tame(2).
Something like this was also suggested earlier by bapt at FreeBSD.
Minus 50 lines of code, deleting one interface function (mparse_wait),
no functional change intended.
Insist that manual page file name extensions must begin with a digit,
lest pkg.conf(5) be shown when pkg(5) is asked for;
issue reported by Michael Reed <m dot reed at mykolab dot com>.
Initial, still somewhat experimental implementation to leverage
less(1) -T and :t ctags(1)-like functionality to jump to the
definitions of various terms inside manual pages.
To be polished in the tree, so bear with me and report issues.
Technically, if less(1) is used as a pager, information is collected
by the mdoc(7) terminal formatter, first stored using the ohash
library, then ultimately written to a temporary file which is passed
to less via -T. No change intended for other output formatters or
when running without a pager.
Based on an idea from Kristaps using feedback from many, in particular
phessler@ nicm@ millert@ halex@ doug@ kspillner@ deraadt@.