We cannot easily control the order of the QUERY_STRING keys generated
by the search form, it's just the order of the fields in the form.
Actually, that's not too bad; the generated URI resembles the
generating form.
To minimize confusion for people looking at URIs, give the keys
in the same order when generating URIs for search listings and
search redirections, the latter being used instead of search
listings that would have only one single entry. Also, if the
manpath is the default, remove it form the generated URIs.
The names of all other struct query memebers match the corresponding
QUERY_STRING keys, so rename "expr" to "query".
Also add some missing function prototypes.
No functional change.
Rewrite http_parse() completely:
1. Make sure the last occurrence of each key is used, even if
it is empty, in which case it resets the value to the default.
2. When there is an HTTP encoding error, skip the affected
key-value pair only, but not all subsequent key-value pairs.
3. Do not modify a string returned from getenv(3).
4. Do not assume the NULL pointer is all null bits.
Sort result pages first by section number, then by name.
By moving the sort from cgi.c to mansearch.c, we get two advantages:
Easier access to the data needed for sorting, in particular the section
number, and the apropos(1) command line utility profits as well.
Provide a dropdown entry "All Architectures" and make it the default.
Still, amd64 remains the default in the following sense:
If a man(1) mode search returns more than one page of the same name,
prefer amd64 over other architectures for immediate display.
ok deraadt@ daniel@
Security fix:
After decoding numeric (\N) and one-character (\<, \> etc.)
character escape sequences, do not forget to HTML-encode the
resulting ASCII character. Malicious manuals were able to smuggle
XSS content by roff-escaping the HTML-special characters they need.
That's a classic bug type in many web applications, actually... :-(
Found myself while auditing the HTML formatter for safe output handling.
Security fix:
The function print_encode() is used both for plain text
and for quoted attribute values.
Escape the '"' character such that malicious manuals cannot pull off
XSS attacks using malformed .Lk, .Mt, .%U, and .UR macros (and maybe
others) to trigger the latter case.
In the former case, escaping does no harm.
Issue found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
Security fix to prevent XSS attacks:
Restrict the character set of strings passed into html_alloc(),
in particular architecture names that come from the QUERY_STRING,
but also SCRIPT_NAME and manpath.conf content for additional safety,
and bail out safely on violations.
Issue reported by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
Kristaps points out that the current HTTP/1.1 draft standard (RFC
2616) requires the Location: response-header field to be an absolute
URI (14.30), and only the most recent proposed standard (RFC 7231),
which is barely a month old, allows a relative Location: (7.1.2).
While most modern browsers appear to support relative Location:
headers, some may not, and it's maybe a bit early to rely on relative
Location: headers.
I'm not going back to the HTTP_HOST or SERVER_NAME CGI variables,
though. While some CGI programs certainly require those, in which
case both the CGI programmer and the web server admin have to be
very careful to keep the system secure and reliable, man.cgi(8)
does not really need them. We always know at compile time which
domain we are running for, and for man.cgi(8), security and reliability
are definitely much more important than flexibility. So make HTTP_HOST
a compile-time definition for now.
Security fix:
Validate the manpath up front and report a Bad Request if it is not
listed in manpath.conf, such that clients can't probe which directories
exist on the server. In case of configuration errors, consistently
report Internal Server Error without disclosing any further information.
Partially based on a patch from Sebastien Marie <semarie-openbsd at
latrappe dot fr>, but avoiding a couple of issues with that patch
and approaching the issue in a somewhat more rigorous way.
Security fix:
Validate the name of the file to show before opening it.
Only allow relative filenames starting with "man" or "cat"
and containing neither "/.." nor "../".
While here, correct the condition discarding an initial "./".
Vulnerability found by Sebastien Marie <semarie-openbsd at latrappe dot fr>.
Many thanks for sending a patch; however, i did not use it but made the
checks even stricter.
Do not use the HTTP_HOST CGI variable,
just make the HTTP redirect Location: relative.
Less user input is good, it reduces the attack surface.
Besides, this removes one global variable and 4 lines of code.
Patch from Sebastien Marie <semarie-openbsd at latrappe dot fr>.
When the MAN_DIR/manpath.conf configuration file does not exist or is empty,
log the problem, hand the pg_error_internal() error page to the client,
and exit(3) in a controlled way instead of stumbling on and segfaulting
later.
Patch from Sebastien Marie <semarie-openbsd at latrappe dot fr>,
messages tweaked by me.
Compatibility hack for the old "manpath=OpenBSD<blank>" query parameter format;
unfortunate, more than 400 links needing this are scattered all around
the www.openbsd.org website, and CVSweb needs this as well.
Make the calltree a bit easier to understand by giving the
functions that call resp_begin_html() names starting with "pg_"
and those called after resp_begin_html() names with "resp_".
No functional change, purely renaming functions.
By popular demand, bring man.cgi default mode closer to what man(1) does:
Even when there are multiple pages with the same name in different
sections, show one of them, using the same priorities as in the
default man.conf(5) file.
Unconfuse .Fa documentation:
You can use .Fa with just a type, without a name,
but when you give both, which is the usual case,
they need to go into one single .Fa argument.
Observed by bentley@; ok jmc@ bentley@.
Install the manuals of the web interface below the same directory
as manpath.conf, such that we do not need to mix our own documentation
into the documentation we are serving, which may not even be possible
if the latter is updated automatically.
No need for run-time configuration, add minimal compile-time
configuration facilities, just two paths and two HTML strings.
Show the title on all pages, not just the index page.
Simplify: Delete 74 lines of code including one enum type, one
global lookup table, two functions, two function arguments, one
struct member, one local variable, and the "search/" and "show/"
part of the URIs, all without losing functionality.
Distinguish between man(1) and apropos(1) mode by adding back the classical
QUERY_STRING variable "apropos=". Change the default back to "apropos=0".
Control it by adding a HTML <SELECT> element for it.
Rename the "expr=" QUERY_STRING variable back to its classical name "query=",
i don't see how the new name is better than the classical one.
While here, drop the concept of a "legacy mode". Simply continue to
support the features, and use what we consider best.
Fix whatis(1) to correctly match words instead of any substrings.
While here, also provide an internal mode (MANSEARCH_MAN) to match
complete names, to be used by man.cgi(8).
Almost everything in the old man.cgi(7) was outdated in one way
or another - catman, catman.conf, CACHE_DIR, /cache, manroots,
replacing '/' with spaces, /tmp...
Instead, document the HTML and URI interfaces, the output and the setup,
and complete the listings of ENVIRONMENT variables and FILES.
Using section 8 instead of section 7 because that's the usual place
for CGI programs, see for example bgplg(8) and slowcgi(8).
Clean up error reporting:
* Consistent naming and use of resp_* functions.
* Split resp_noresult() out of resp_search() and reuse it.
* Log information about internal errors.
* And some minor fixes.
namespace cleanups:
CGI variable: s/CACHE_DIR/MAN_DIR/ because it's static, not a cache
default MAN_DIR: /cache/man.cgi/ -> /man/ see above
global variable: s/cache/mandir/ see above
global variable: s/css/cssdir/ for consistency with mandir
global variable: s/host/httphost/ for consistency with HTTP_HOST
global variable: s/progname/scriptname/ for consistency with SCRIPT_NAME
struct query: member s/manroot/manpath/ for consistency with QUERY_STRING
Simplify pathgen() even more.
Let manpath.conf be a plain text list of the directories to use.
As a bonus, this makes the order configurable.
Get rid of <dirent.h>, opendir(3), readdir(3), stat(2).
Switch over man.cgi to SQLite. While here:
* Simplify pathgen(), just use the subdirs of the cache dir.
* Simplify URI paths, just use show/<manpath>/<filename>.
* Drop struct paths, just use plain strings.
* Garbage collect unused headers.
Simplify man_unscope(), removing 18 lines of code, that is,
removing one function argument, one function definition,
three function invocations and two pointless assert()s.
No functional change.
Clean up messages related to plain text and to escape sequences.
* Mention invalid escape sequences and string names, and fallbacks.
* Hierarchical naming.
Fix expansion of escape sequences with incomplete arguments.
* For \* and \n, discard the incomplete arg, expand to empty string.
* For \B, discard the incomplete arg, expand to the digit 0.
* For \w, use the incomplete arg (behaviour unchanged).
Fix handling of escape sequences taking numeric arguments.
* Repair detection of invalid delimiters.
* Discard the invalid delimiter together with the invalid sequence.
Note to self: In general, strchr("\0...", c) is a thoroughly bad idea.
Cleanup with respect to bad macro arguments.
* Fix .Sm with invalid arg: move arg out and toggle mode.
* Promote "unknown standard" from WARNING to ERROR, it loses information.
* Delete MANDOCERR_BADWIDTH, it would only indicate a mandoc(1) bug.
* Do not report MANDOCERR_BL_LATETYPE when there is no type at all.
* Mention macro names, arguments and fallbacks.
Cleanup regarding -offset and -width:
* Bugfix: Last one wins, not first one.
* Fix .Bl -width without argument: it means 0n, so do not ignore it.
* Report macro names, argument names and fallbacks in related messages.
* Simplify: Garbage collect auxiliary variables in pre_bd() and pre_bl().
Clean up messages regarding excess arguments:
* Downgrade ".Bf -emphasis Em" from FATAL to WARNING.
* Mention the macros, the arguments, and the fallbacks.
* Hierarchical naming.
Clean up messages related to missing arguments.
* Do not warn about empty -column cells, they seem valid to me.
* Downgrade empty item and missing -std from ERROR to WARNING.
* Hierarchical naming.
* Descriptive, not imperative style.
* Mention macro names, argument names, and fallbacks.
* Garbage collect some unreachable code in post_it().
Fix formatting of empty .Bl -inset item heads.
Downgrade empty item heads from ERROR to WARNING.
Show the list type in the error message.
Choose better variable names for nodes in post_it().
MANDOCERR_NOARGS reported three completely unrelated classes of problems.
Split the roff(7) parts out of it and report the request names for these cases.
When .Sm is called without an argument, groff toggles the spacing mode,
so let us do the same for compatibility. Using this feature is of
course not recommended except in manual page obfuscation contests.
Disentangle the MANDOCERR_CHILD message, which reported three
completely different things, into three distinct messages.
Also mention the macro names we are talking about.
Clean up warnings related to macros and nesting.
* Hierarchical naming of enum mandocerr items.
* Improve the wording to make it comprehensible.
* Mention the offending macro.
* Garbage collect one chunk of ancient, long unreachable code.
Fix the column numbers associated with in_line_argn() macros;
this bug is more than four years old, introduced by kristaps@
in mdocml.bsd.lv rev. 1.46, March 30, 2010.
Implement the obsolete macros .En .Es .Fr .Ot for backward compatibility,
since this is hardly more complicated than explicitly ignoring them
as we did in the past. Of course, do not use them!
Clean up the warnings related to document structure.
* Hierarchical naming of the related enum mandocerr items.
* Mention the offending macro, section title, or string.
While here, improve some wordings:
* Descriptive instead of imperative style.
* Uniform style for "missing" and "skipping".
* Where applicable, mention the fallback used.
The previous commit to this file broke the control flow keywords \{ and \}
when they immediately follow a request or macro name, without intervening
whitespace. Minimal fix.
The lesson learnt here is that, despite their appearance, \{ and \} are
not escape sequences, so never skip them when parsing for names.
Ingo Schwarze [Sun, 29 Jun 2014 23:26:00 +0000 (23:26 +0000)]
Use the freshly improved roff_getname() function
for the main roff request parsing routine, roff_parse().
In request or macro invocations, escape sequences now terminate the
request or macro name; what follows is treated as arguments. Besides,
the names of user-defined macros can now contain backslashes (eek!).
Ingo Schwarze [Sun, 29 Jun 2014 22:38:47 +0000 (22:38 +0000)]
Use the freshly improved roff_getname() function
for the .de parsing routine, roff_block(),
to correctly handle names terminated by escape sequences.
Besides, this saves us 20 lines of code.
Ingo Schwarze [Sun, 29 Jun 2014 22:14:10 +0000 (22:14 +0000)]
Major roff_getname() cleanup.
* Return the name even if it is terminated by an escape sequence, not a blank.
* Skip the full escape sequence using mandoc_escape(), not just the first byte.
* Make it non-destructive, return the length instead of writing a '\0'.
* Let .ds and .as cope with the above changes to the internal interface.
* Fix .rm and .rr to accept an escape sequence as the end of a name.
* Fix .nr and .rr to not set/delete a register with an empty name.
Ingo Schwarze [Wed, 25 Jun 2014 00:20:19 +0000 (00:20 +0000)]
Improve messages related to the roff(7) .so request.
In all these messages, show the filename argument that was passed
to the .so request.
In case of failure, show an additional message reporting the file
and the line number where the failing request was found.
The existing message reporting the reason for the failure -
for example, "Permission denied" - is left in place, unchanged.
Inspired by a question asked by Nick@ after he saw the
confusing old messages that used to be emitted in this area.
Ingo Schwarze [Tue, 24 Jun 2014 21:43:08 +0000 (21:43 +0000)]
Deprecate .Tn and .Ux, and make it clearer that .Bt and .Ud are deprecated.
Do not use these macros in new documents, they provide no value.
Instead, usually no macro and no markup is needed at all.
Of course, they remain supported for compatibility with existing manuals.
Jason McIntyre (OpenBSD), Thomas Klausner (NetBSD) and
Franco Fichtner (DragonFly) are OK with this documentation change.
Ingo Schwarze [Sun, 22 Jun 2014 17:07:06 +0000 (17:07 +0000)]
Minimal COMPATIBILITY cleanup:
* Mention that the list is incomplete.
* I implemented %C for groff -current, and it was accepted.
* Font family is \F, not \f.
* Escapes and scaling widths are documented in roff(7), not here.
* Quoting quotes by doubling them is now supported.
Ingo Schwarze [Sun, 22 Jun 2014 16:39:45 +0000 (16:39 +0000)]
Minimal cleanup of the COMPATIBILITY section:
* Mention that the list is incomplete.
* Quoting quotes by doubling them is documented in the
Ossanna/Kernighan/Ritter Nroff/Troff User's Manual, Section 7.3.
* Our roff(7) manual documents handling of escape sequences;
besides, we partially support \w and \z now.
* Scaling widths are documented in roff(7) as well, and f is not \f.
* Negative arguments to .sp are handled now.
Ingo Schwarze [Sat, 21 Jun 2014 16:18:25 +0000 (16:18 +0000)]
Prefix messages about bad command line options and arguments
with "mandoc: " or "makewhatis: ", respectively,
similar to what we already do for other messages.
Ingo Schwarze [Fri, 20 Jun 2014 23:02:31 +0000 (23:02 +0000)]
As suggested by jmc@, only include line and column numbers into messages
when they are meaningful, to avoid confusing stuff like this:
$ mandoc /dev/null
mandoc: /dev/null:0:1: FATAL: not a manual
Instead, just say:
mandoc: /dev/null: FATAL: not a manual
Another example this applies to is documents having a prologue,
but lacking a body. Do not throw a FATAL error for these; instead,
issue a WARNING and show the empty document, in the man(7) case with
the same amount of blank lines as groff does. Also downgrade mdoc(7)
documents having content before the first .Sh from FATAL to WARNING.
Ingo Schwarze [Fri, 20 Jun 2014 17:24:00 +0000 (17:24 +0000)]
Start systematic improvements of error reporting.
So far, this covers all WARNINGs related to the prologue.
1) hierarchical naming of MANDOCERR_* constants
2) mention the macro name in messages where that adds clarity
3) add one missing MANDOCERR_DATE_MISSING msg
4) fix the wording of one message related to the man(7) prologue
Ingo Schwarze [Fri, 20 Jun 2014 16:11:42 +0000 (16:11 +0000)]
Prefix error messages from mandoc(1) with "mandoc: "
just like almost all other utility programs do.
Suggested by nick@ who wondered where messages came from
when calling mandoc(1) from inside a Perl script.
ok jmc@ nick@
Ingo Schwarze [Fri, 20 Jun 2014 02:24:40 +0000 (02:24 +0000)]
Merge from OpenBSD - Marc Espie improved the ohash interface:
* rename the halloc callback to calloc, provide overflow protection
* rename the hfree callback to free, drop the useless size argument
* prevent integer overflows in ohash_resize
Ingo Schwarze [Fri, 20 Jun 2014 01:21:48 +0000 (01:21 +0000)]
More tweaking of set_basedir().
1) Do not error out when getcwd(3) fails, only fail when inaccessibility
of the cwd prevents processing of relative paths given on the command line.
2) Do not uselessly call set_basedir() twice in a row.
While fts_read(3) in treescan() does cause the cwd to jump around,
fts_close(3) is always called at the end, putting us back
where we came from. The -d/-u fallback code already relied on this.
Ingo Schwarze [Thu, 19 Jun 2014 00:45:37 +0000 (00:45 +0000)]
Some simple set_basedir() cleanup; more to come.
1) Refrain from calling set_basedir() in the -t case,
and do not attempt to strip anything from the file names in that case.
Testing individual files cannot reasonably have any notion of a base dir.
2) Remove the possibility of passing NULL to set_basedir().
It was dangerous because it was not idempotent, and it served no purpose
except closing a file descriptor right before exit(), which is pointless.
Besides, the file descriptor is likely to be removed completely, soon.
3) Make sure that /foobar isn't treated as a subdirectory of /foo;
this fixes a bug reported by espie@.
Ingo Schwarze [Wed, 18 Jun 2014 19:34:04 +0000 (19:34 +0000)]
Merge OpenBSD rev. 1.108 by sthen@; original commit message:
Don't display "unable to open mandoc.db" error messages (SQLITE_CANTOPEN)
in the code which opens mandocdb's sqlite database when updating/deleting
individual files (as used and only really useful for pkg_add/pkg_delete).
Ingo Schwarze [Wed, 7 May 2014 16:19:03 +0000 (16:19 +0000)]
Render roff escape sequences contained in manual page descriptions
before putting them into the mpages table.
Issue found by bentley@ in OpenBSD::Getopt(3p).