Ingo Schwarze [Fri, 20 Jun 2014 17:24:00 +0000 (17:24 +0000)]
Start systematic improvements of error reporting.
So far, this covers all WARNINGs related to the prologue.
1) hierarchical naming of MANDOCERR_* constants
2) mention the macro name in messages where that adds clarity
3) add one missing MANDOCERR_DATE_MISSING msg
4) fix the wording of one message related to the man(7) prologue
Ingo Schwarze [Fri, 20 Jun 2014 16:11:42 +0000 (16:11 +0000)]
Prefix error messages from mandoc(1) with "mandoc: "
just like almost all other utility programs do.
Suggested by nick@ who wondered where messages came from
when calling mandoc(1) from inside a Perl script.
ok jmc@ nick@
Ingo Schwarze [Fri, 20 Jun 2014 02:24:40 +0000 (02:24 +0000)]
Merge from OpenBSD - Marc Espie improved the ohash interface:
* rename the halloc callback to calloc, provide overflow protection
* rename the hfree callback to free, drop the useless size argument
* prevent integer overflows in ohash_resize
Ingo Schwarze [Fri, 20 Jun 2014 01:21:48 +0000 (01:21 +0000)]
More tweaking of set_basedir().
1) Do not error out when getcwd(3) fails, only fail when inaccessibility
of the cwd prevents processing of relative paths given on the command line.
2) Do not uselessly call set_basedir() twice in a row.
While fts_read(3) in treescan() does cause the cwd to jump around,
fts_close(3) is always called at the end, putting us back
where we came from. The -d/-u fallback code already relied on this.
Ingo Schwarze [Thu, 19 Jun 2014 00:45:37 +0000 (00:45 +0000)]
Some simple set_basedir() cleanup; more to come.
1) Refrain from calling set_basedir() in the -t case,
and do not attempt to strip anything from the file names in that case.
Testing individual files cannot reasonably have any notion of a base dir.
2) Remove the possibility of passing NULL to set_basedir().
It was dangerous because it was not idempotent, and it served no purpose
except closing a file descriptor right before exit(), which is pointless.
Besides, the file descriptor is likely to be removed completely, soon.
3) Make sure that /foobar isn't treated as a subdirectory of /foo;
this fixes a bug reported by espie@.
Ingo Schwarze [Wed, 18 Jun 2014 19:34:04 +0000 (19:34 +0000)]
Merge OpenBSD rev. 1.108 by sthen@; original commit message:
Don't display "unable to open mandoc.db" error messages (SQLITE_CANTOPEN)
in the code which opens mandocdb's sqlite database when updating/deleting
individual files (as used and only really useful for pkg_add/pkg_delete).
Ingo Schwarze [Wed, 7 May 2014 16:19:03 +0000 (16:19 +0000)]
Render roff escape sequences contained in manual page descriptions
before putting them into the mpages table.
Issue found by bentley@ in OpenBSD::Getopt(3p).
Improve error handling in dbopen(). If PRAGMA SQL statements fail,
report the error, close the database, and return failure from dbopen(),
such that the main program can recover and rebuild the database.
As noticed by stsp@, this can happen when database files are
accessible, but corrupt or in the wrong format, which will now
automatically be repaired.
Besides, use a safer idiom after sqlite3_open*() failure that also
handles out-of-memory situations correctly, and do not forget to
close the database after CREATE TABLE failure.
OMRON used uppercase for the model names of their Motorola 88100 LUNA
workstations, so show the kernel architecture names in uppercase
to the user, too.
Based on a patch from Kenji Aoyama@, thanks!
Fix a minor optimization i broke in rev. 1.163 on August 20, 2010:
Do not bother looking into the hash table when the length of the macro
already tells us it's invalid. No functional change.
Noticed by jsg@, thanks!
Reduce the verbosity of makewhatis -t:
In the past, it always showed the title lines of the files processed.
Now, it only shows them when called with -D.
That is better because pkg_create calls makewhatis -t.
It is also more consistent with -D behaviour in non- -t modes.
Issue reported by ajacoutot@; ok espie@ ajacoutot@ jasper@.
Various Makefile improvements:
* Use sha256 rather than md5.
* Update .h dependencies for some objects.
* Provide `www' target to build everything needed for the web site.
* Move .SUFFIXES and .PHONY technicalities to the bottom.
* State Copyright and license, just for clarity.
Audit malloc(3)/calloc(3)/realloc(3) usage.
* Change eight reallocs to reallocarray to be safe from overflows.
* Change one malloc to reallocarray to be safe from overflows.
* Change one calloc to reallocarray, no zeroing needed.
* Change the order of arguments of three callocs (aesthetical).
Audit strlcpy(3)/strlcat(3) usage:
* Add missing truncation checks to three calls.
* In four cases where we know that the distination buffer is large enough,
cast the return vailue to (void).
* Repair three instances of silent truncation, use asprintf(3).
* Change two instances of strlen(3)+malloc(3)+strlcpy(3)+strlcat(3)+...
to use asprintf(3) instead to make them less error prone.
* Cast the return value of four instances where the destination
buffer is known to be large enough to (void).
* Completely remove three useless instances of strlcpy(3)/strlcat(3).
* Mark two places in -Thtml with XXX that can cause information loss
and crashes but are not easy to fix, requiring design changes of
some internal interfaces.
* The file mandocdb.c remains to be audited.
in debug messages, truncating strings of excessive lengths is actually
a good thing, so cast the return value from sprintf to (void);
this concludes the mandoc sprintf audit
fix unchecked snprintf(3) in page header printing:
the length of the title is unknown, and speed doesn't matter here,
so use asprintf/free rather than a static buffer
KNF: case (FOO): -> case FOO:, remove /* LINTED */ and /* ARGSUSED */,
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change
Two minor tweaks regarding the fallback from -u/-d to default mode:
(1) Use all files found on the command line, but do *not* use all stray
files found during fallback tree recursion.
(2) If the fallback works, call that success, i.e. exit(0).
As pointed out by naddy@, the latter is required for ports' happiness.
Properly handle symlinks (hardlinks and .so only files were already ok):
Use the file name of the symlink but the inode number of the file pointed to,
such that we get multiple mlinks records but not multiple mpages records.
Also make sure they do not point outside the tree we are processing.
Issue found by kili@ in desktop-file-edit(1), thanks!
In update mode, when opening the database fails, probably because it is
missing or corrupt, just rebuild it from scratch. This also helps when
installing the very first port on a freshly installed machine
and is similar to what espie@'s classical makewhatis(8) did.
Garbage collect one pair of needless parentheses in SQL code generation;
note this doesn't affect performance, SQLite generates the same byte code.
While here, make the calls to exprspec() easier to understand.
Give the mlinks and keys tables a pageid index,
as suggested by jeremy@ and espie@.
The mlinks index speeds up basic apropos(1) searches by around 30%
because it speeds up the final SELECT FROM mlinks query by about 95%.
For large result sets, the overall speedup gets even larger, in the
extreme case of "apropos Nd~." bymore than 90%.
The keys index finally makes the apropos(1) -O option usable: It no longer
incurs relevant extra cost, while in the past it was embarrassingly slow.
This comes at a cost: Total database build times grow by about 5%,
and each index adds about 10% database size with -Q. I consider that
acceptable in view of the huge apropos(1) performance gains.
The -Q database for /usr/share/man still remains below 1 MB.
Pass the function flags SQLITE_UTF8 (because SQLITE_ANY is deprecated)
and SQLITE_DETERMINISTIC when creating deterministic functions;
best practice measure suggested by espie@ and jeremy@;
as expected by jeremy@, no measurable effect on performance.
At the end of mansearch(), fchdir() back to where we started from;
this is cleaner and helps to not scatter gmon.out files all over
the place when profiling.
Using macros in .Sh header lines, or having .Sm off or .Bk -words open
while processing .Sh, is not at all recommended, but it's not strictly
a syntax violation either, and in any case, mandoc must not die in an
assertion. I broke this in rev. 1.124.
Crash found while trying to read the (rather broken) original 4.3BSD-Reno
od(1) manual page.
Unify description handling across all document types (mdoc, man, cat).
Assert that the description is unset right before calling the parse_*
handler, and assign a default if it's still unset right afterwards.
Remove all stray asserts and default assignments found elsewhere.
This fixes SQL_STEP failures for man(7) pages lacking descriptions.
Further apropos(1) speed optimization was trickier than anticipated.
Contrary to what i initially thought, almost all time is now spent
inside sqlite3(3) routines, and i found no easy way calling less of them.
However, sqlite(3) spends substantial time in malloc(3), and even more
(twice that) in its immediate malloc wrapper, sqlite3MemMalloc(),
keeping track of all individual malloc chunk sizes. Typically about
90% of the malloced memory is used for purposes of the pagecache.
By providing an mmap(3) MAP_ANON SQLITE_CONFIG_PAGECACHE, execution
time decreases by 20-25% for simple (Nd and/or Nm) queries, 10-20% for
non-NAME queries, and even apropos(1) resident memory size as reported
by top(1) decreases by 20% for simple and by 60% for non-NAME queries.
The new function, mansearch_setup(), spends no measurable time.
The pagesize chosen is optimal:
* Substantially smaller pages yield no gain at all.
* Larger pages provide no additional benefit and just waste memory.
The chosen number of pages in the cache is a compromise:
* For simple queries, a handful of pages would suffice to get the full
speed effect, at an apropos(1) resident memory size of about 2.0 MB.
* For non-NAME queries, a large pagecache with 2k pages (2.5 MB) might
gain a few more percent in speed, but at the expense of doubling the
apropos(1) resident memory size for *all* queries.
* The chosen number of 256 pages (330 kB) allows nearly full speed gain
for all queries at the price of a 15% resident memory size increase.
Next speed optimization step for the new apropos(1).
Split manual names out of the common "keys" table into their
own "names" table. This reduces standard apropos(1) search
times (i.e. searching for names and descriptions only) by
typically about 70% for the full /usr/share/man database.
(Yes, that multiplies with the previous optimization step,
so both together have reduced search times by a factor of
more than six. I'm not done yet, expect more to come.)
Even with the minimal databases built with makewhatis(8) -Q,
this step still reduces search times by 15-20%. For both cases,
database sizes and build times hardly change (+/-2%).
After careful gprof(1)ing of the new apropos(1), move the descriptions
back from the keys table to the mpages table: I found a good way
to still use them in searches, without complication of the code.
On my notebook, this reduces typical apropos(1) search times by about 40%,
it reduces /usr/share/man database size by 6% in makewhatis(8) -Q mode
and by 2% in standard mode (less overhead storing pointers to mpages),
and it doesn't measurably change database build times (may even be
going down by a percent or so because less data is being copied
around in ohashes).
Add a new term_flushln() flag TERMP_BRIND (if break, then indent)
to control indentation of continuation lines in TERMP_NOBREAK mode.
In the past, this was always on; continue using it
for .Bl, .Nm, .Fn, .Fo, and .HP, but no longer for .IP and .TP.
I looked at this because sthen@ reported the issue in a manual
of a Perl module from ports, but it affects base, too: This patch
reduces groff-mandoc differences in base by more than 15%.
If the SYNOPSIS section contains an excessively long .Nm,
adjust the right margin to avoid running into an assertion;
output in that case now agrees with groff, too.
Fully implement the \B (validate numerical expression) and
partially implement the \w (measure text width) escape sequence
in a way that makes them usable in numerical expressions and in
conditional requests, similar to how \n (interpolate number register)
and \* (expand user-defined string) are implemented.
This lets mandoc(1) handle the baroque low-level roff code
found at the beginning of the ggrep(1) manual.
Thanks to pascal@ for the report.
We already supported (outer) user-defined strings containing references
to other (inner) user-defined strings in their values, such that the inner
ones get expanded at expansion time of the outer ones (delayed evaluation).
Now we also support specifying the name of an (outer) user-defined
string to expand using the expanded values of some other (inner)
user-defined strings (indirect reference).
Almost complete implementation of roff(7) numerical expressions.
Support all binary operators except ';' (scale conversion).
Fully support chained operations and nested parentheses.
Use this for the .nr, .if, and .ie requests.
While here, fix parsing of integer numbers in roff_getnum().
Implement the roff(7) .rr (remove register) request.
As reported by sthen@, the perl-5.18 pod2man(1) preamble
thinks cool kids use that in manuals. I hope *you* know better.
In -p (picky) mode, warn unless each filename (aka mlink)
appears as a name in the NAME section.
While here, garbage collect two unused variables, both called "match".
Warn about missing mlinks.
This is really expensive, more than tripling database build times,
so only do it when the -p (picky) option was given, but none of the
following options were given: -Q (quick), -d, -u, or -t.
Remember which names are in the NAME section.
This helps to find missing MLINKS.
Database build times do not change and database growth is minimal
(1.2% with -Q, 0.7% without -Q in /usr/share/man),
so making this optional would be pointless.
When the -n or -t flag is given to makewhatis(8),
write names and decriptions to stdout,
in a format similar to apropos(1) output.
Inspired by espie@'s makewhatis.
Instead of silently doing nothing at all,
warn and return non-zero when the manpath is empty, that is,
when /etc/man.conf is non-existent or unreadable
AND the environment variable MANPATH is unset or empty
AND no directories were given on the command line.
Inspired by the error handling in espie@'s makewhatis(8),
except that one doesn't know about MANPATH.
Rename the -W option to -p (mnemonics: picky, print to stderr):
That letter was already chosen by espie@ for OpenBSD 2.7,
so avoid being gratuitiously different more than a decade later.
Accept -v for backward compatibility with espie@'s makewhatis,
even though it does nothing right now.
The -v option of mandocdb(8) clashes with the -v option of espie@'s
makewhatis(8), which traditionally does something different,
so rename it to -D (mnemonics: Debug, Dump, Display).
Ingo Schwarze [Mon, 31 Mar 2014 01:05:32 +0000 (01:05 +0000)]
Support the CONTEXT section for kernel manual pages found in Solaris and
OpenBSD manuals. It describes which contexts you can call functions in.
from dlg@, ok jmc@ deraadt@
Ingo Schwarze [Fri, 28 Mar 2014 23:26:25 +0000 (23:26 +0000)]
Allow leading and trailing vertical lines,
and format them in the same way as groff.
While here, do not require whitespace before vertical lines
in layout specifications.
Issues found by bentley@ in mpv(1).
Ingo Schwarze [Fri, 28 Mar 2014 19:17:12 +0000 (19:17 +0000)]
Properly initialize malloc(3)ed memory.
With this bug fix, partly unitialized memory could sometimes be
returned, sometimes causing crashes by bogus free(3)s in apropos(1).
Ingo Schwarze [Wed, 26 Mar 2014 21:39:38 +0000 (21:39 +0000)]
Without bloating mandoc(1) itself, let mandocdb(8) support files
called manN/X.N.gz and catN/X.0.gz, reading them through a pipe(2)
from gunzip(1) -c. Asked for by various people in the past.
Ingo Schwarze [Wed, 26 Mar 2014 20:53:36 +0000 (20:53 +0000)]
Improve error reporting.
Simplify combining a custom format string with perror(),
avoiding many manual calls to strerror(errno).
For low-level failures, report attempted function calls.
Do not abuse the say() filename argument for files outside the basedir,
and even less for other text.
Ingo Schwarze [Sun, 23 Mar 2014 15:14:50 +0000 (15:14 +0000)]
Retire the old concat() function.
For .Sh, i wasn't even needed at all.
For .Dd, .Nm, and .Os, use the new mdoc_deroff() instead.
This gets rid of the last limited-size static buffers in this file,
hence eliminates the last explicit MANDOCERR_MEM throwers here,
and it shortens the code by 50 lines.
Ingo Schwarze [Sun, 23 Mar 2014 12:26:58 +0000 (12:26 +0000)]
If a man(7) NAME section contains macros, avoid truncated or empty
entries for .Nd in mandocdb(8), instead use the macro content
recursively. This improves indexing of more than 200 manuals
in Xenocara, i.e. more than 15%, in particular GL and some Xkb.
Ingo Schwarze [Sun, 23 Mar 2014 11:59:17 +0000 (11:59 +0000)]
The files mandoc.c and mandoc.h contained both specialised low-level
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
Ingo Schwarze [Sun, 23 Mar 2014 11:25:25 +0000 (11:25 +0000)]
The files mandoc.c and mandoc.h contained both specialised low-level
functions used for multiple languages (mdoc, man, roff), for example
mandoc_escape(), mandoc_getarg(), mandoc_eos(), and generic auxiliary
functions. Split the auxiliaries out into their own file and header.
While here, do some #include cleanup.
Ingo Schwarze [Thu, 20 Mar 2014 02:57:28 +0000 (02:57 +0000)]
Remove currently unimplemented macros from the lists of used-defined
macros to be cleared during .Dd and .TH because clearing them at that
point defeats the purpose of backup implementations provided in the
manual page itself, some of which _do_ work with mandoc(1).
While here, add the new .%C macro to the list to be cleared.
Ingo Schwarze [Wed, 19 Mar 2014 22:33:09 +0000 (22:33 +0000)]
Register pure .so pages as mlinks, not as mpages.
This doesn't affect /usr/share/man, but improves /usr/X11R6/man:
* Eliminates multiple apropos(1) output for such pages.
* Reduces X11R6 database size from 450 kB to 240 kB (-47%).
* Reduces X11R6 database build time from 1.68s to 1.00s (-40%).
Ingo Schwarze [Wed, 19 Mar 2014 22:20:43 +0000 (22:20 +0000)]
Without the MPARSE_SO option, if the file contains nothing but a
single .so request, do not read the file pointed to, but instead
let mparse_result() provide the file name pointed to as a return
value. To be used by makewhatis(8) in the future.
Ingo Schwarze [Wed, 19 Mar 2014 21:51:20 +0000 (21:51 +0000)]
Generalize the mparse_alloc() and roff_alloc() functions by giving
them an "options" argument, replacing the existing "inttype" and
"quick" arguments, preparing for a future MPARSE_SO option.
Store this argument in struct mparse and struct roff, replacing the
existing "inttype", "parsetype", and "quick" members.
No functional change except one tiny cosmetic fix in roff_TH().
Ingo Schwarze [Tue, 18 Mar 2014 16:56:10 +0000 (16:56 +0000)]
Allow checking that databases are up to date even when you have no write
permission on the databases, as requested by espie@ quite some time ago.
But make sure to not slow database generation down when you do have write
permission, and to not delay error reporting in -Q mode.
Ingo Schwarze [Mon, 17 Mar 2014 09:43:56 +0000 (09:43 +0000)]
Sync to OpenBSD:
* do not talk about shell globbing
* describe logical operations
* improve examples
* add HISTORY
* some wording improvements for clarity
Ingo Schwarze [Thu, 13 Mar 2014 19:23:50 +0000 (19:23 +0000)]
In -Tutf8 mode, make sure that hyphens get counted against the output line
length even when they are breakable. Before this, a line containing N
breakable hyphens could get up to N characters wider than the right margin
in -Tutf8 output mode.
Issue reported by tedu@ on <bugs at OpenBSD>.
Ingo Schwarze [Sat, 8 Mar 2014 16:22:04 +0000 (16:22 +0000)]
In .nf mode, use the MAN_LINE flag to detect input line breaks
instead of the man_node line member. This is required to preserve
line breaks contained in user-defined macros called in .nf mode.
Found in a code audit triggered by fixing a similar issue in .TP.
Ingo Schwarze [Sat, 8 Mar 2014 15:50:41 +0000 (15:50 +0000)]
To find out whether .TP head arguments are same-line or next-line arguments,
use the MAN_LINE flag instead of the man_node line member.
This is required such that user-defined macros wrapping .TP work correctly.
Issue found by Havard Eidnes in Tcl_NewStringObj(3), reported via
the NetBSD bug tracking system and Thomas Klausner <wiz at NetBSD>.