Ingo Schwarze [Fri, 1 Aug 2014 19:38:29 +0000 (19:38 +0000)]
Fix floating point handling: When converting double to size_t,
properly round to the nearest M (=0.001m), which is the smallest
available unit.
This avoids weirdness like (size_t)(0.6 * 10.0) == 5
by instead calculating (size_t)(0.6 * 10.0 + 0.0005) == 6,
and so it fixes the indentation of the readline(3) manual.
Ingo Schwarze [Fri, 1 Aug 2014 19:25:52 +0000 (19:25 +0000)]
Clarity with respect to floating point handling:
Write double constants as double rather than integer literals.
Remove useless explicit (double) cast done at one place and nowhere else.
No functional change.
In .Bl -column, if some of the column width declarations are given
right after the -column argument and some at the very end of the
argument list, after some other arguments like -compact, concatenate
the column lists.
This gets rid of one of the last useless FATAL errors
and actually shortens the code by a few lines.
This fixes an issue introduced more than five years ago, at first
causing an assert() since mdoc_action.c rev. 1.14 (June 17, 2009),
then later a FATAL error since mdoc_validate rev. 1.130 (Nov. 30, 2010),
and marked as "TODO" ever since.
Remove the useless FATAL error "argument count wrong, violates syntax".
The last remaining instance was .It in .Bl -column with more than one
excessive .Ta. However, simply downgrading from FATAL to ERROR, it just
works fine, almost the same way as in groff, without any other changes.
Improve handling of next-line scope broken by end of file.
Detect the condition earlier, report in the error message
which block is broken, and delete the broken block.
Consequently, empty section headers can no longer happen.
Get rid of the useless FATAL error "child violates parent syntax".
When finding items outside lists, simply skip them and throw an ERROR.
Handle subsections before the first section instead of bailing out.
Remove two useless FATAL errors.
When a file contains neither text nor macros, treat it as an empty document.
When the mdoc(7) document prologue is incomplete, use some default values.
Various improvements related to .Ex and .Rv:
* let .Nm fall back to the empty string, not to UNKNOWN
* never let .Rv copy an argument from .Nm
* avoid spurious \fR after empty .Nm in -Tman
* correct handling of .Ex and .Rv in -Tman
* correct the wording of the output for .Rv without arguments
* use non-breaking spaces in .Ex and .Rv output where required
* split MANDOCERR_NONAME into a warning for .Ex and an error for .Nm
In groff, .Bd -centered operates in fill mode, which is relatively
hard to implement, while this implementation operates in non-fill
mode so far. As long as you pay attention that your lines do not
overflow, it works. To make sure that rendering is the same for
mandoc and groff, it is recommended to insert .br between lines
for now. This implementation will need improvement later.
Choosing the right encoding is a tricky business...
Printing query strings for URIs *always* needs URI-encoding, and when
embedding the URI into an HTML document, it needs replacement of
the "&" separators by "&" *in addition to that*, not instead.
Delete the function html_primtquery(), it was completely wrong.
You can see the badness by entering "mandoc &sec=2" into the query input
box before this patch and click "Submit". You come to the right page at
first (...man.cgi?query=mandoc+%26sec%3D2&apropos=0&sec=0&...), but now
the link to mandoc(1) is wrong: ...mandoc.1?query=mandoc &sec=2&...
Clicking on that, the "&sec=2" disappears from the query input box and
suddenly you have the first dropdown set to "2 - System Calls". Oops.
Sort the URI keys for .Xr links in the same order used by the search form,
and leave out the manpath when it is the default.
For building the HTML formatter options, do not use a static buffer.
We cannot easily control the order of the QUERY_STRING keys generated
by the search form, it's just the order of the fields in the form.
Actually, that's not too bad; the generated URI resembles the
generating form.
To minimize confusion for people looking at URIs, give the keys
in the same order when generating URIs for search listings and
search redirections, the latter being used instead of search
listings that would have only one single entry. Also, if the
manpath is the default, remove it form the generated URIs.
The names of all other struct query memebers match the corresponding
QUERY_STRING keys, so rename "expr" to "query".
Also add some missing function prototypes.
No functional change.
Rewrite http_parse() completely:
1. Make sure the last occurrence of each key is used, even if
it is empty, in which case it resets the value to the default.
2. When there is an HTTP encoding error, skip the affected
key-value pair only, but not all subsequent key-value pairs.
3. Do not modify a string returned from getenv(3).
4. Do not assume the NULL pointer is all null bits.
Sort result pages first by section number, then by name.
By moving the sort from cgi.c to mansearch.c, we get two advantages:
Easier access to the data needed for sorting, in particular the section
number, and the apropos(1) command line utility profits as well.
Provide a dropdown entry "All Architectures" and make it the default.
Still, amd64 remains the default in the following sense:
If a man(1) mode search returns more than one page of the same name,
prefer amd64 over other architectures for immediate display.
ok deraadt@ daniel@
Security fix:
After decoding numeric (\N) and one-character (\<, \> etc.)
character escape sequences, do not forget to HTML-encode the
resulting ASCII character. Malicious manuals were able to smuggle
XSS content by roff-escaping the HTML-special characters they need.
That's a classic bug type in many web applications, actually... :-(
Found myself while auditing the HTML formatter for safe output handling.
Security fix:
The function print_encode() is used both for plain text
and for quoted attribute values.
Escape the '"' character such that malicious manuals cannot pull off
XSS attacks using malformed .Lk, .Mt, .%U, and .UR macros (and maybe
others) to trigger the latter case.
In the former case, escaping does no harm.
Issue found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
Security fix to prevent XSS attacks:
Restrict the character set of strings passed into html_alloc(),
in particular architecture names that come from the QUERY_STRING,
but also SCRIPT_NAME and manpath.conf content for additional safety,
and bail out safely on violations.
Issue reported by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
Kristaps points out that the current HTTP/1.1 draft standard (RFC
2616) requires the Location: response-header field to be an absolute
URI (14.30), and only the most recent proposed standard (RFC 7231),
which is barely a month old, allows a relative Location: (7.1.2).
While most modern browsers appear to support relative Location:
headers, some may not, and it's maybe a bit early to rely on relative
Location: headers.
I'm not going back to the HTTP_HOST or SERVER_NAME CGI variables,
though. While some CGI programs certainly require those, in which
case both the CGI programmer and the web server admin have to be
very careful to keep the system secure and reliable, man.cgi(8)
does not really need them. We always know at compile time which
domain we are running for, and for man.cgi(8), security and reliability
are definitely much more important than flexibility. So make HTTP_HOST
a compile-time definition for now.
Security fix:
Validate the manpath up front and report a Bad Request if it is not
listed in manpath.conf, such that clients can't probe which directories
exist on the server. In case of configuration errors, consistently
report Internal Server Error without disclosing any further information.
Partially based on a patch from Sebastien Marie <semarie-openbsd at
latrappe dot fr>, but avoiding a couple of issues with that patch
and approaching the issue in a somewhat more rigorous way.
Security fix:
Validate the name of the file to show before opening it.
Only allow relative filenames starting with "man" or "cat"
and containing neither "/.." nor "../".
While here, correct the condition discarding an initial "./".
Vulnerability found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
Many thanks for sending a patch; however, i did not use it but made the
checks even stricter.
Do not use the HTTP_HOST CGI variable,
just make the HTTP redirect Location: relative.
Less user input is good, it reduces the attack surface.
Besides, this removes one global variable and 4 lines of code.
Patch from Sebastien Marie <semarie-openbsd at latrappe dot fr>.
When the MAN_DIR/manpath.conf configuration file does not exist or is empty,
log the problem, hand the pg_error_internal() error page to the client,
and exit(3) in a controlled way instead of stumbling on and segfaulting
later.
Patch from Sebastien Marie <semarie-openbsd at latrappe dot fr>,
messages tweaked by me.
Compatibility hack for the old "manpath=OpenBSD<blank>" query parameter format;
unfortunate, more than 400 links needing this are scattered all around
the www.openbsd.org website, and CVSweb needs this as well.
Make the calltree a bit easier to understand by giving the
functions that call resp_begin_html() names starting with "pg_"
and those called after resp_begin_html() names with "resp_".
No functional change, purely renaming functions.
By popular demand, bring man.cgi default mode closer to what man(1) does:
Even when there are multiple pages with the same name in different
sections, show one of them, using the same priorities as in the
default man.conf(5) file.
Unconfuse .Fa documentation:
You can use .Fa with just a type, without a name,
but when you give both, which is the usual case,
they need to go into one single .Fa argument.
Observed by bentley@; ok jmc@ bentley@.
Install the manuals of the web interface below the same directory
as manpath.conf, such that we do not need to mix our own documentation
into the documentation we are serving, which may not even be possible
if the latter is updated automatically.
No need for run-time configuration, add minimal compile-time
configuration facilities, just two paths and two HTML strings.
Show the title on all pages, not just the index page.
Simplify: Delete 74 lines of code including one enum type, one
global lookup table, two functions, two function arguments, one
struct member, one local variable, and the "search/" and "show/"
part of the URIs, all without losing functionality.
Distinguish between man(1) and apropos(1) mode by adding back the classical
QUERY_STRING variable "apropos=". Change the default back to "apropos=0".
Control it by adding a HTML <SELECT> element for it.
Rename the "expr=" QUERY_STRING variable back to its classical name "query=",
i don't see how the new name is better than the classical one.
While here, drop the concept of a "legacy mode". Simply continue to
support the features, and use what we consider best.
Fix whatis(1) to correctly match words instead of any substrings.
While here, also provide an internal mode (MANSEARCH_MAN) to match
complete names, to be used by man.cgi(8).
Almost everything in the old man.cgi(7) was outdated in one way
or another - catman, catman.conf, CACHE_DIR, /cache, manroots,
replacing '/' with spaces, /tmp...
Instead, document the HTML and URI interfaces, the output and the setup,
and complete the listings of ENVIRONMENT variables and FILES.
Using section 8 instead of section 7 because that's the usual place
for CGI programs, see for example bgplg(8) and slowcgi(8).
Clean up error reporting:
* Consistent naming and use of resp_* functions.
* Split resp_noresult() out of resp_search() and reuse it.
* Log information about internal errors.
* And some minor fixes.
namespace cleanups:
CGI variable: s/CACHE_DIR/MAN_DIR/ because it's static, not a cache
default MAN_DIR: /cache/man.cgi/ -> /man/ see above
global variable: s/cache/mandir/ see above
global variable: s/css/cssdir/ for consistency with mandir
global variable: s/host/httphost/ for consistency with HTTP_HOST
global variable: s/progname/scriptname/ for consistency with SCRIPT_NAME
struct query: member s/manroot/manpath/ for consistency with QUERY_STRING
Simplify pathgen() even more.
Let manpath.conf be a plain text list of the directories to use.
As a bonus, this makes the order configurable.
Get rid of <dirent.h>, opendir(3), readdir(3), stat(2).
Switch over man.cgi to SQLite. While here:
* Simplify pathgen(), just use the subdirs of the cache dir.
* Simplify URI paths, just use show/<manpath>/<filename>.
* Drop struct paths, just use plain strings.
* Garbage collect unused headers.
Simplify man_unscope(), removing 18 lines of code, that is,
removing one function argument, one function definition,
three function invocations and two pointless assert()s.
No functional change.
Clean up messages related to plain text and to escape sequences.
* Mention invalid escape sequences and string names, and fallbacks.
* Hierarchical naming.
Fix expansion of escape sequences with incomplete arguments.
* For \* and \n, discard the incomplete arg, expand to empty string.
* For \B, discard the incomplete arg, expand to the digit 0.
* For \w, use the incomplete arg (behaviour unchanged).
Fix handling of escape sequences taking numeric arguments.
* Repair detection of invalid delimiters.
* Discard the invalid delimiter together with the invalid sequence.
Note to self: In general, strchr("\0...", c) is a thoroughly bad idea.
Cleanup with respect to bad macro arguments.
* Fix .Sm with invalid arg: move arg out and toggle mode.
* Promote "unknown standard" from WARNING to ERROR, it loses information.
* Delete MANDOCERR_BADWIDTH, it would only indicate a mandoc(1) bug.
* Do not report MANDOCERR_BL_LATETYPE when there is no type at all.
* Mention macro names, arguments and fallbacks.
Cleanup regarding -offset and -width:
* Bugfix: Last one wins, not first one.
* Fix .Bl -width without argument: it means 0n, so do not ignore it.
* Report macro names, argument names and fallbacks in related messages.
* Simplify: Garbage collect auxiliary variables in pre_bd() and pre_bl().
Clean up messages regarding excess arguments:
* Downgrade ".Bf -emphasis Em" from FATAL to WARNING.
* Mention the macros, the arguments, and the fallbacks.
* Hierarchical naming.
Clean up messages related to missing arguments.
* Do not warn about empty -column cells, they seem valid to me.
* Downgrade empty item and missing -std from ERROR to WARNING.
* Hierarchical naming.
* Descriptive, not imperative style.
* Mention macro names, argument names, and fallbacks.
* Garbage collect some unreachable code in post_it().
Fix formatting of empty .Bl -inset item heads.
Downgrade empty item heads from ERROR to WARNING.
Show the list type in the error message.
Choose better variable names for nodes in post_it().
MANDOCERR_NOARGS reported three completely unrelated classes of problems.
Split the roff(7) parts out of it and report the request names for these cases.
When .Sm is called without an argument, groff toggles the spacing mode,
so let us do the same for compatibility. Using this feature is of
course not recommended except in manual page obfuscation contests.
Disentangle the MANDOCERR_CHILD message, which reported three
completely different things, into three distinct messages.
Also mention the macro names we are talking about.
Clean up warnings related to macros and nesting.
* Hierarchical naming of enum mandocerr items.
* Improve the wording to make it comprehensible.
* Mention the offending macro.
* Garbage collect one chunk of ancient, long unreachable code.
Fix the column numbers associated with in_line_argn() macros;
this bug is more than four years old, introduced by kristaps@
in mdocml.bsd.lv rev. 1.46, March 30, 2010.
Implement the obsolete macros .En .Es .Fr .Ot for backward compatibility,
since this is hardly more complicated than explicitly ignoring them
as we did in the past. Of course, do not use them!
Clean up the warnings related to document structure.
* Hierarchical naming of the related enum mandocerr items.
* Mention the offending macro, section title, or string.
While here, improve some wordings:
* Descriptive instead of imperative style.
* Uniform style for "missing" and "skipping".
* Where applicable, mention the fallback used.
The previous commit to this file broke the control flow keywords \{ and \}
when they immediately follow a request or macro name, without intervening
whitespace. Minimal fix.
The lesson learnt here is that, despite their appearance, \{ and \} are
not escape sequences, so never skip them when parsing for names.
Ingo Schwarze [Sun, 29 Jun 2014 23:26:00 +0000 (23:26 +0000)]
Use the freshly improved roff_getname() function
for the main roff request parsing routine, roff_parse().
In request or macro invocations, escape sequences now terminate the
request or macro name; what follows is treated as arguments. Besides,
the names of user-defined macros can now contain backslashes (eek!).
Ingo Schwarze [Sun, 29 Jun 2014 22:38:47 +0000 (22:38 +0000)]
Use the freshly improved roff_getname() function
for the .de parsing routine, roff_block(),
to correctly handle names terminated by escape sequences.
Besides, this saves us 20 lines of code.