Fix a corner case where \H<nil> (where <nil> is the \0 character) would
cause mandoc_escape() to read past the end of an allocated string.
Found when a script scanning of all Mac OSX manual accidentally also
scanned binary (gzip'd) files, discussed with schwarze@ on tech@.
Ingo Schwarze [Sun, 17 Aug 2014 22:10:29 +0000 (22:10 +0000)]
While all current callers pass valid data to ascii_hspan() only,
it's safer to assume incoming enum data might be invalid
and catch it instead of happily returning an unitialized int.
No functional change right now.
Ingo Schwarze [Sun, 17 Aug 2014 20:53:50 +0000 (20:53 +0000)]
Do not require getsubopt() to provide extern char *suboptarg.
We don't use it anyway in mandoc. Like this, fewer systems need
the compat implementation. In particular, we can now use the stock
getsubopt() on glibc and musl.
Besides, the comment in the BSD getsubopt.c that error messages are
tricky without *suboptarg is massively overblown. If you simply
save a copy of the pointer you pass into getsubopt(), that's quite
usable for an error message.
People start campaigning for the addition of *suboptarg to C libraries
on the grounds that mandoc wants it, but actually, i consider library
functions manipulating global data quite ugly, so stop pushing people
into that questionable direction.
While here, add an explicit Copyright header to the test file.
While it's obviously to me what Kristaps intended, others might
consider this file copyrightable and wonder what's up.
Ingo Schwarze [Sun, 17 Aug 2014 16:44:41 +0000 (16:44 +0000)]
KNF: fix indentation of previous commit, see style(9):
"Indentation is an 8 character tab. Second level indents are four spaces."
All the rest of this file already conforms.
Protect against accessing "n->next->child" by first checking "n->next".
Noticed in a crash against ".It Nm Fo" with no closing "Fc".
Original patch expanded by schwarze@ then extended even more.
Ingo Schwarze [Sun, 17 Aug 2014 03:24:47 +0000 (03:24 +0000)]
Fully integrate apropos(1) into mandoc(1).
Switch the argmode on the progname, including man(1).
Provide -f and -k options to switch the argmode.
Store the argmode inside struct search, generalizing the flags.
Derive the deftype from the argmode when needed instead of storing it.
Store the outkey inside struct search instead of passing it alone.
While here, get rid of the trailing blanks in Makefile.depend.
Ingo Schwarze [Sat, 16 Aug 2014 23:04:25 +0000 (23:04 +0000)]
When BUILD_DB is active, link apropos(1) into the mandoc binary.
This is the first step on the way to a man(1) implementation.
The new ./configure is flexible enough to make this step quite easy.
Ingo Schwarze [Sat, 16 Aug 2014 19:50:37 +0000 (19:50 +0000)]
If a stray .It follows .El, we are no longer in the list,
even though the list is still the last processed macro.
This fixes a regression introduced in mdoc_macro.c rev. 1.138:
Ulrich Spoerlein <uqs at FreeBSD> reports that various of their
kernel manuals trigger assertions.
Ingo Schwarze [Sat, 16 Aug 2014 19:00:01 +0000 (19:00 +0000)]
Improve build system and autodetection.
* Make ./configure standalone, that's what people expect.
* Let people write a ./configure.local from scratch, not edit existing files.
* Autodetect wchar, sqlite3, and manpath and act accordingly.
* Autodetect the need for -L/usr/local/lib and -lutil.
* Get rid of config.h.p{re,ost}, let ./configure only write what's needed.
* Let ./configure write a Makefile.local snippet, that's quite flexible.
Ingo Schwarze [Thu, 14 Aug 2014 22:33:10 +0000 (22:33 +0000)]
Some compilers apparently worry that abort() might return
and then throw a "may be used uninitialized" warning, so
sprinkle some /* NOTREACHED */. No functional change.
Noticed by Thomas Klausner <wiz at NetBSD dot org>.
Ingo Schwarze [Thu, 14 Aug 2014 00:31:43 +0000 (00:31 +0000)]
Revert previous, as requested by kristaps@.
The .Bf block can contain subblocks, so it has to render as an
element that can contain flow content. But <em> cannot contain
flow content, only phrasing content. Rendering .Em and .Bf differently
would by unfortunate, and closing out .Bf before subblocks and
re-opening it afterwards would merely complicate both the C code
of the program and the generated HTML code. Besides, converting
.Em to semantic HTML markup would require some content to be put
into <em> and some into <i>, but we cannot automatically distinguish
which is which, so strictly speaking, we can't use semantic HTML
here but have to fall back to physical markup. Wonders of HTML...
Begin cleaning up scaling units.
Start with the horizontal terminal specifiers, making sure that they match
up with troff.
Then move on to PS, PDF, and HTML, noting that we stick to the terminal
default width for "u".
Lastly, fix some completely-wrong documentation and note that we diverge
from troff w/r/t "u".
Ingo Schwarze [Wed, 13 Aug 2014 15:25:22 +0000 (15:25 +0000)]
Use <em> for .Em and .Bf -emphasis.
The vast majority of .Em in real-world manuals is stress emphasis,
for which <em> is the correct markup. Admittedly, there are some
instances of .Em usage for alternate quality, for which <i> would
be a better match. Most of these are technical terms that neither
allow semantic markup nor are keywords - for the latter, .Sy would
be preferable. A typical example is that the shell breaks input into
.Em words .
Alternate voice or mood, which would also require <i>, is almost
absent from manuals.
We cannot satisfy both stress emphasis and alternate quality, so
pick the one that fits more often and looks less wrong when off.
Patch from Guy Harris <guy at alum dot mit dot edu>.
ok joerg@ bentley@
Ingo Schwarze [Tue, 12 Aug 2014 19:28:16 +0000 (19:28 +0000)]
In mdoc(7) and man(7), if a width is given as a bare number without
specifying a unit, the implied unit is 'n' (on the terminal, one
character position; in PostScript, half of the current font size
in points), not 'u' (roff output device basic unit). No functional
change right now, but important for the upcoming scaling unit fixes.
Ingo Schwarze [Tue, 12 Aug 2014 19:28:03 +0000 (19:28 +0000)]
The macro SCALE_HS_INIT() is always passed the result of strlen() or
an equivalent number as its argument, and strlen() measures the width
of a string in characters, not in basic units. No functional change
right now, but important for the upcoming scaling unit fixes.
Ingo Schwarze [Mon, 11 Aug 2014 01:39:00 +0000 (01:39 +0000)]
Provide a fallback version of fts(3) for systems lacking it.
I chose the OpenBSD version because it apparently contains various
bugfixes that never made it into libnbcompat. To reduce size and
complexity, i stripped out the features we don't need.
Ingo Schwarze [Sun, 10 Aug 2014 23:54:41 +0000 (23:54 +0000)]
Get rid of HAVE_CONFIG_H, it is always defined; idea from libnbcompat.
Include <sys/types.h> where needed, it does not belong in config.h.
Remove <stdio.h> from config.h; if it is missing somewhere, it should
be added, but i cannot find a *.c file where it is missing.
Ingo Schwarze [Fri, 8 Aug 2014 23:47:21 +0000 (23:47 +0000)]
Do not hardcode stuff in ./configure that is actually user-configurable
in the Makefile; instead, pass it down via the environment just
like CFLAGS.
Nice suggestion from kristaps@ hoping to make MacOS X happier.
Ingo Schwarze [Fri, 8 Aug 2014 23:43:47 +0000 (23:43 +0000)]
Delete the __attribute__((__bounded__(...))) annotation.
That's an OpenBSD-specific gcc-4.2.1 security extension.
It's certainly a bad idea to use such stuff in a compatibility header,
as other operating systems just won't understand it.
Ingo Schwarze [Fri, 8 Aug 2014 16:38:06 +0000 (16:38 +0000)]
When .Sm is called without an argument, groff toggles the spacing mode,
so let us do the same for compatibility. Using this feature is of
course not recommended except in manual page obfuscation contests.
Ingo Schwarze [Wed, 6 Aug 2014 15:09:05 +0000 (15:09 +0000)]
Bring the handling of defective prologues even closer to groff,
in particular relaxing the distinction between prologue and body
and further improving messages.
* The last .Dd wins and the last .Os wins, even in the body.
* The last .Dt before the first body macro wins.
* Missing title in .Dt defaults to UNTITLED. Warn about it.
* Missing section in .Dt does not default to 1. But warn about it.
* Do not warn multiple times about the same mdoc(7) prologue macro.
* Warn about missing .Os.
* Incomplete .TH defaults to empty strings. Warn about it.
Ingo Schwarze [Tue, 5 Aug 2014 14:43:10 +0000 (14:43 +0000)]
Absurdly, the return value of sqlite3_column_text()
is "const unsigned char *", which causes warnings with GCC on Linux.
Explicitly cast to "const char *" to avoid this.
Issue noticed by kristaps@.
Ingo Schwarze [Tue, 5 Aug 2014 12:50:52 +0000 (12:50 +0000)]
Since old SQLite versions do not have sqlite3_errstr(),
provide a dummy fallback implementation.
Do not bother to decode the error, SQLite error codes
are not useful enough for that to be worthwhile.
Note that using sqlite3_errmsg(db) would be a bad idea:
On malloc() failure, db is NULL, which would cause a segfault.
Issue noticed by kristaps@.
Ingo Schwarze [Tue, 5 Aug 2014 11:19:13 +0000 (11:19 +0000)]
Portability fix:
* POSIX syntax is 'include Makefile.depend', not '.include "Makefile.depend"'
* gmake(1) runs the build rule for the included file (duh), so delete the rule
* consequently, we have to mark the 'depend' maintainer target .PHONY
* as it's now .PHONY anyway, drop some prerequisites that are now useless
Issue noticed by kristaps@.
Ingo Schwarze [Tue, 5 Aug 2014 05:48:56 +0000 (05:48 +0000)]
Sync library documentation with reality.
Split mandoc_escape(3), mandoc_malloc(3), and mchars_alloc(3)
out of mandoc(3), adding lots of new information.
Ingo Schwarze [Tue, 5 Aug 2014 03:02:40 +0000 (03:02 +0000)]
Properly partition the build system and install some missing stuff:
* Introduce targets base-build, db-build, cgi-build.
* Introduce targets base-install, db-install, cgi-install.
* Introduce a BUILD_TARGETS variable to contain db-build and cgi-build.
* Introduce an INSTALL_TARGETS variable and fill it using BUILD_TARGETS.
* Install the whatis(1) and makewhatis(8) binaries.
* Install the apropos(1), whatis(1), and makewhatis(8) manuals.
* Install mandoc_aux.h.
* Do not build manpage(1) by default.
Ingo Schwarze [Tue, 5 Aug 2014 01:45:02 +0000 (01:45 +0000)]
Various minor corrections:
* Do not unconditionally use -I/usr/local/include and -L/usr/local/lib.
* Do not install programs and libs root-writeable.
* Add missing test-strcasestr.c and test-strsep.c to TESTSRCS.
* Add missing cgi.h.example and mandoc_html.3 to SRCS.
* Add missing mandoc_html.3.html to WWW_MANS.
Ingo Schwarze [Sat, 2 Aug 2014 00:02:42 +0000 (00:02 +0000)]
Simplify by allowing only one post-handler.
Saves 36 static arrays and 10 lines of code
at the expense of only five new trivial static functions.
No functional change.
Ingo Schwarze [Fri, 1 Aug 2014 21:24:17 +0000 (21:24 +0000)]
Simplify man(7) validation:
Drop pre-handlers, they were almost unused.
Drop the needless complexity of allowing more than one post-handler.
This saves one internal interface function, one static function, one
private struct definition, sixteen static arrays, and 45 lines of code.
No functional change.
Ingo Schwarze [Fri, 1 Aug 2014 19:38:29 +0000 (19:38 +0000)]
Fix floating point handling: When converting double to size_t,
properly round to the nearest M (=0.001m), which is the smallest
available unit.
This avoids weirdness like (size_t)(0.6 * 10.0) == 5
by instead calculating (size_t)(0.6 * 10.0 + 0.0005) == 6,
and so it fixes the indentation of the readline(3) manual.
Ingo Schwarze [Fri, 1 Aug 2014 19:25:52 +0000 (19:25 +0000)]
Clarity with respect to floating point handling:
Write double constants as double rather than integer literals.
Remove useless explicit (double) cast done at one place and nowhere else.
No functional change.
In .Bl -column, if some of the column width declarations are given
right after the -column argument and some at the very end of the
argument list, after some other arguments like -compact, concatenate
the column lists.
This gets rid of one of the last useless FATAL errors
and actually shortens the code by a few lines.
This fixes an issue introduced more than five years ago, at first
causing an assert() since mdoc_action.c rev. 1.14 (June 17, 2009),
then later a FATAL error since mdoc_validate rev. 1.130 (Nov. 30, 2010),
and marked as "TODO" ever since.
Remove the useless FATAL error "argument count wrong, violates syntax".
The last remaining instance was .It in .Bl -column with more than one
excessive .Ta. However, simply downgrading from FATAL to ERROR, it just
works fine, almost the same way as in groff, without any other changes.
Improve handling of next-line scope broken by end of file.
Detect the condition earlier, report in the error message
which block is broken, and delete the broken block.
Consequently, empty section headers can no longer happen.
Get rid of the useless FATAL error "child violates parent syntax".
When finding items outside lists, simply skip them and throw an ERROR.
Handle subsections before the first section instead of bailing out.
Remove two useless FATAL errors.
When a file contains neither text nor macros, treat it as an empty document.
When the mdoc(7) document prologue is incomplete, use some default values.
Various improvements related to .Ex and .Rv:
* let .Nm fall back to the empty string, not to UNKNOWN
* never let .Rv copy an argument from .Nm
* avoid spurious \fR after empty .Nm in -Tman
* correct handling of .Ex and .Rv in -Tman
* correct the wording of the output for .Rv without arguments
* use non-breaking spaces in .Ex and .Rv output where required
* split MANDOCERR_NONAME into a warning for .Ex and an error for .Nm
In groff, .Bd -centered operates in fill mode, which is relatively
hard to implement, while this implementation operates in non-fill
mode so far. As long as you pay attention that your lines do not
overflow, it works. To make sure that rendering is the same for
mandoc and groff, it is recommended to insert .br between lines
for now. This implementation will need improvement later.
Choosing the right encoding is a tricky business...
Printing query strings for URIs *always* needs URI-encoding, and when
embedding the URI into an HTML document, it needs replacement of
the "&" separators by "&" *in addition to that*, not instead.
Delete the function html_primtquery(), it was completely wrong.
You can see the badness by entering "mandoc &sec=2" into the query input
box before this patch and click "Submit". You come to the right page at
first (...man.cgi?query=mandoc+%26sec%3D2&apropos=0&sec=0&...), but now
the link to mandoc(1) is wrong: ...mandoc.1?query=mandoc &sec=2&...
Clicking on that, the "&sec=2" disappears from the query input box and
suddenly you have the first dropdown set to "2 - System Calls". Oops.
Sort the URI keys for .Xr links in the same order used by the search form,
and leave out the manpath when it is the default.
For building the HTML formatter options, do not use a static buffer.
We cannot easily control the order of the QUERY_STRING keys generated
by the search form, it's just the order of the fields in the form.
Actually, that's not too bad; the generated URI resembles the
generating form.
To minimize confusion for people looking at URIs, give the keys
in the same order when generating URIs for search listings and
search redirections, the latter being used instead of search
listings that would have only one single entry. Also, if the
manpath is the default, remove it form the generated URIs.
The names of all other struct query memebers match the corresponding
QUERY_STRING keys, so rename "expr" to "query".
Also add some missing function prototypes.
No functional change.
Rewrite http_parse() completely:
1. Make sure the last occurrence of each key is used, even if
it is empty, in which case it resets the value to the default.
2. When there is an HTTP encoding error, skip the affected
key-value pair only, but not all subsequent key-value pairs.
3. Do not modify a string returned from getenv(3).
4. Do not assume the NULL pointer is all null bits.
Sort result pages first by section number, then by name.
By moving the sort from cgi.c to mansearch.c, we get two advantages:
Easier access to the data needed for sorting, in particular the section
number, and the apropos(1) command line utility profits as well.
Provide a dropdown entry "All Architectures" and make it the default.
Still, amd64 remains the default in the following sense:
If a man(1) mode search returns more than one page of the same name,
prefer amd64 over other architectures for immediate display.
ok deraadt@ daniel@
Security fix:
After decoding numeric (\N) and one-character (\<, \> etc.)
character escape sequences, do not forget to HTML-encode the
resulting ASCII character. Malicious manuals were able to smuggle
XSS content by roff-escaping the HTML-special characters they need.
That's a classic bug type in many web applications, actually... :-(
Found myself while auditing the HTML formatter for safe output handling.
Security fix:
The function print_encode() is used both for plain text
and for quoted attribute values.
Escape the '"' character such that malicious manuals cannot pull off
XSS attacks using malformed .Lk, .Mt, .%U, and .UR macros (and maybe
others) to trigger the latter case.
In the former case, escaping does no harm.
Issue found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
Security fix to prevent XSS attacks:
Restrict the character set of strings passed into html_alloc(),
in particular architecture names that come from the QUERY_STRING,
but also SCRIPT_NAME and manpath.conf content for additional safety,
and bail out safely on violations.
Issue reported by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
Kristaps points out that the current HTTP/1.1 draft standard (RFC
2616) requires the Location: response-header field to be an absolute
URI (14.30), and only the most recent proposed standard (RFC 7231),
which is barely a month old, allows a relative Location: (7.1.2).
While most modern browsers appear to support relative Location:
headers, some may not, and it's maybe a bit early to rely on relative
Location: headers.
I'm not going back to the HTTP_HOST or SERVER_NAME CGI variables,
though. While some CGI programs certainly require those, in which
case both the CGI programmer and the web server admin have to be
very careful to keep the system secure and reliable, man.cgi(8)
does not really need them. We always know at compile time which
domain we are running for, and for man.cgi(8), security and reliability
are definitely much more important than flexibility. So make HTTP_HOST
a compile-time definition for now.
Security fix:
Validate the manpath up front and report a Bad Request if it is not
listed in manpath.conf, such that clients can't probe which directories
exist on the server. In case of configuration errors, consistently
report Internal Server Error without disclosing any further information.
Partially based on a patch from Sebastien Marie <semarie-openbsd at
latrappe dot fr>, but avoiding a couple of issues with that patch
and approaching the issue in a somewhat more rigorous way.
Security fix:
Validate the name of the file to show before opening it.
Only allow relative filenames starting with "man" or "cat"
and containing neither "/.." nor "../".
While here, correct the condition discarding an initial "./".
Vulnerability found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
Many thanks for sending a patch; however, i did not use it but made the
checks even stricter.
Do not use the HTTP_HOST CGI variable,
just make the HTTP redirect Location: relative.
Less user input is good, it reduces the attack surface.
Besides, this removes one global variable and 4 lines of code.
Patch from Sebastien Marie <semarie-openbsd at latrappe dot fr>.