Simplify pathgen() even more.
Let manpath.conf be a plain text list of the directories to use.
As a bonus, this makes the order configurable.
Get rid of <dirent.h>, opendir(3), readdir(3), stat(2).
Switch over man.cgi to SQLite. While here:
* Simplify pathgen(), just use the subdirs of the cache dir.
* Simplify URI paths, just use show/<manpath>/<filename>.
* Drop struct paths, just use plain strings.
* Garbage collect unused headers.
Simplify man_unscope(), removing 18 lines of code, that is,
removing one function argument, one function definition,
three function invocations and two pointless assert()s.
No functional change.
Clean up messages related to plain text and to escape sequences.
* Mention invalid escape sequences and string names, and fallbacks.
* Hierarchical naming.
Fix expansion of escape sequences with incomplete arguments.
* For \* and \n, discard the incomplete arg, expand to empty string.
* For \B, discard the incomplete arg, expand to the digit 0.
* For \w, use the incomplete arg (behaviour unchanged).
Fix handling of escape sequences taking numeric arguments.
* Repair detection of invalid delimiters.
* Discard the invalid delimiter together with the invalid sequence.
Note to self: In general, strchr("\0...", c) is a thoroughly bad idea.
Cleanup with respect to bad macro arguments.
* Fix .Sm with invalid arg: move arg out and toggle mode.
* Promote "unknown standard" from WARNING to ERROR, it loses information.
* Delete MANDOCERR_BADWIDTH, it would only indicate a mandoc(1) bug.
* Do not report MANDOCERR_BL_LATETYPE when there is no type at all.
* Mention macro names, arguments and fallbacks.
Cleanup regarding -offset and -width:
* Bugfix: Last one wins, not first one.
* Fix .Bl -width without argument: it means 0n, so do not ignore it.
* Report macro names, argument names and fallbacks in related messages.
* Simplify: Garbage collect auxiliary variables in pre_bd() and pre_bl().
Clean up messages regarding excess arguments:
* Downgrade ".Bf -emphasis Em" from FATAL to WARNING.
* Mention the macros, the arguments, and the fallbacks.
* Hierarchical naming.
Clean up messages related to missing arguments.
* Do not warn about empty -column cells, they seem valid to me.
* Downgrade empty item and missing -std from ERROR to WARNING.
* Hierarchical naming.
* Descriptive, not imperative style.
* Mention macro names, argument names, and fallbacks.
* Garbage collect some unreachable code in post_it().
Fix formatting of empty .Bl -inset item heads.
Downgrade empty item heads from ERROR to WARNING.
Show the list type in the error message.
Choose better variable names for nodes in post_it().
MANDOCERR_NOARGS reported three completely unrelated classes of problems.
Split the roff(7) parts out of it and report the request names for these cases.
When .Sm is called without an argument, groff toggles the spacing mode,
so let us do the same for compatibility. Using this feature is of
course not recommended except in manual page obfuscation contests.
Disentangle the MANDOCERR_CHILD message, which reported three
completely different things, into three distinct messages.
Also mention the macro names we are talking about.
Clean up warnings related to macros and nesting.
* Hierarchical naming of enum mandocerr items.
* Improve the wording to make it comprehensible.
* Mention the offending macro.
* Garbage collect one chunk of ancient, long unreachable code.
Fix the column numbers associated with in_line_argn() macros;
this bug is more than four years old, introduced by kristaps@
in mdocml.bsd.lv rev. 1.46, March 30, 2010.
Implement the obsolete macros .En .Es .Fr .Ot for backward compatibility,
since this is hardly more complicated than explicitly ignoring them
as we did in the past. Of course, do not use them!
Clean up the warnings related to document structure.
* Hierarchical naming of the related enum mandocerr items.
* Mention the offending macro, section title, or string.
While here, improve some wordings:
* Descriptive instead of imperative style.
* Uniform style for "missing" and "skipping".
* Where applicable, mention the fallback used.
The previous commit to this file broke the control flow keywords \{ and \}
when they immediately follow a request or macro name, without intervening
whitespace. Minimal fix.
The lesson learnt here is that, despite their appearance, \{ and \} are
not escape sequences, so never skip them when parsing for names.
Ingo Schwarze [Sun, 29 Jun 2014 23:26:00 +0000 (23:26 +0000)]
Use the freshly improved roff_getname() function
for the main roff request parsing routine, roff_parse().
In request or macro invocations, escape sequences now terminate the
request or macro name; what follows is treated as arguments. Besides,
the names of user-defined macros can now contain backslashes (eek!).
Ingo Schwarze [Sun, 29 Jun 2014 22:38:47 +0000 (22:38 +0000)]
Use the freshly improved roff_getname() function
for the .de parsing routine, roff_block(),
to correctly handle names terminated by escape sequences.
Besides, this saves us 20 lines of code.
Ingo Schwarze [Sun, 29 Jun 2014 22:14:10 +0000 (22:14 +0000)]
Major roff_getname() cleanup.
* Return the name even if it is terminated by an escape sequence, not a blank.
* Skip the full escape sequence using mandoc_escape(), not just the first byte.
* Make it non-destructive, return the length instead of writing a '\0'.
* Let .ds and .as cope with the above changes to the internal interface.
* Fix .rm and .rr to accept an escape sequence as the end of a name.
* Fix .nr and .rr to not set/delete a register with an empty name.
Ingo Schwarze [Wed, 25 Jun 2014 00:20:19 +0000 (00:20 +0000)]
Improve messages related to the roff(7) .so request.
In all these messages, show the filename argument that was passed
to the .so request.
In case of failure, show an additional message reporting the file
and the line number where the failing request was found.
The existing message reporting the reason for the failure -
for example, "Permission denied" - is left in place, unchanged.
Inspired by a question asked by Nick@ after he saw the
confusing old messages that used to be emitted in this area.
Ingo Schwarze [Tue, 24 Jun 2014 21:43:08 +0000 (21:43 +0000)]
Deprecate .Tn and .Ux, and make it clearer that .Bt and .Ud are deprecated.
Do not use these macros in new documents, they provide no value.
Instead, usually no macro and no markup is needed at all.
Of course, they remain supported for compatibility with existing manuals.
Jason McIntyre (OpenBSD), Thomas Klausner (NetBSD) and
Franco Fichtner (DragonFly) are OK with this documentation change.
Ingo Schwarze [Sun, 22 Jun 2014 17:07:06 +0000 (17:07 +0000)]
Minimal COMPATIBILITY cleanup:
* Mention that the list is incomplete.
* I implemented %C for groff -current, and it was accepted.
* Font family is \F, not \f.
* Escapes and scaling widths are documented in roff(7), not here.
* Quoting quotes by doubling them is now supported.
Ingo Schwarze [Sun, 22 Jun 2014 16:39:45 +0000 (16:39 +0000)]
Minimal cleanup of the COMPATIBILITY section:
* Mention that the list is incomplete.
* Quoting quotes by doubling them is documented in the
Ossanna/Kernighan/Ritter Nroff/Troff User's Manual, Section 7.3.
* Our roff(7) manual documents handling of escape sequences;
besides, we partially support \w and \z now.
* Scaling widths are documented in roff(7) as well, and f is not \f.
* Negative arguments to .sp are handled now.
Ingo Schwarze [Sat, 21 Jun 2014 16:18:25 +0000 (16:18 +0000)]
Prefix messages about bad command line options and arguments
with "mandoc: " or "makewhatis: ", respectively,
similar to what we already do for other messages.
Ingo Schwarze [Fri, 20 Jun 2014 23:02:31 +0000 (23:02 +0000)]
As suggested by jmc@, only include line and column numbers into messages
when they are meaningful, to avoid confusing stuff like this:
$ mandoc /dev/null
mandoc: /dev/null:0:1: FATAL: not a manual
Instead, just say:
mandoc: /dev/null: FATAL: not a manual
Another example this applies to is documents having a prologue,
but lacking a body. Do not throw a FATAL error for these; instead,
issue a WARNING and show the empty document, in the man(7) case with
the same amount of blank lines as groff does. Also downgrade mdoc(7)
documents having content before the first .Sh from FATAL to WARNING.
Ingo Schwarze [Fri, 20 Jun 2014 17:24:00 +0000 (17:24 +0000)]
Start systematic improvements of error reporting.
So far, this covers all WARNINGs related to the prologue.
1) hierarchical naming of MANDOCERR_* constants
2) mention the macro name in messages where that adds clarity
3) add one missing MANDOCERR_DATE_MISSING msg
4) fix the wording of one message related to the man(7) prologue
Ingo Schwarze [Fri, 20 Jun 2014 16:11:42 +0000 (16:11 +0000)]
Prefix error messages from mandoc(1) with "mandoc: "
just like almost all other utility programs do.
Suggested by nick@ who wondered where messages came from
when calling mandoc(1) from inside a Perl script.
ok jmc@ nick@
Ingo Schwarze [Fri, 20 Jun 2014 02:24:40 +0000 (02:24 +0000)]
Merge from OpenBSD - Marc Espie improved the ohash interface:
* rename the halloc callback to calloc, provide overflow protection
* rename the hfree callback to free, drop the useless size argument
* prevent integer overflows in ohash_resize
Ingo Schwarze [Fri, 20 Jun 2014 01:21:48 +0000 (01:21 +0000)]
More tweaking of set_basedir().
1) Do not error out when getcwd(3) fails, only fail when inaccessibility
of the cwd prevents processing of relative paths given on the command line.
2) Do not uselessly call set_basedir() twice in a row.
While fts_read(3) in treescan() does cause the cwd to jump around,
fts_close(3) is always called at the end, putting us back
where we came from. The -d/-u fallback code already relied on this.
Ingo Schwarze [Thu, 19 Jun 2014 00:45:37 +0000 (00:45 +0000)]
Some simple set_basedir() cleanup; more to come.
1) Refrain from calling set_basedir() in the -t case,
and do not attempt to strip anything from the file names in that case.
Testing individual files cannot reasonably have any notion of a base dir.
2) Remove the possibility of passing NULL to set_basedir().
It was dangerous because it was not idempotent, and it served no purpose
except closing a file descriptor right before exit(), which is pointless.
Besides, the file descriptor is likely to be removed completely, soon.
3) Make sure that /foobar isn't treated as a subdirectory of /foo;
this fixes a bug reported by espie@.
Ingo Schwarze [Wed, 18 Jun 2014 19:34:04 +0000 (19:34 +0000)]
Merge OpenBSD rev. 1.108 by sthen@; original commit message:
Don't display "unable to open mandoc.db" error messages (SQLITE_CANTOPEN)
in the code which opens mandocdb's sqlite database when updating/deleting
individual files (as used and only really useful for pkg_add/pkg_delete).
Ingo Schwarze [Wed, 7 May 2014 16:19:03 +0000 (16:19 +0000)]
Render roff escape sequences contained in manual page descriptions
before putting them into the mpages table.
Issue found by bentley@ in OpenBSD::Getopt(3p).
Improve error handling in dbopen(). If PRAGMA SQL statements fail,
report the error, close the database, and return failure from dbopen(),
such that the main program can recover and rebuild the database.
As noticed by stsp@, this can happen when database files are
accessible, but corrupt or in the wrong format, which will now
automatically be repaired.
Besides, use a safer idiom after sqlite3_open*() failure that also
handles out-of-memory situations correctly, and do not forget to
close the database after CREATE TABLE failure.
OMRON used uppercase for the model names of their Motorola 88100 LUNA
workstations, so show the kernel architecture names in uppercase
to the user, too.
Based on a patch from Kenji Aoyama@, thanks!
Fix a minor optimization i broke in rev. 1.163 on August 20, 2010:
Do not bother looking into the hash table when the length of the macro
already tells us it's invalid. No functional change.
Noticed by jsg@, thanks!
Reduce the verbosity of makewhatis -t:
In the past, it always showed the title lines of the files processed.
Now, it only shows them when called with -D.
That is better because pkg_create calls makewhatis -t.
It is also more consistent with -D behaviour in non- -t modes.
Issue reported by ajacoutot@; ok espie@ ajacoutot@ jasper@.
Various Makefile improvements:
* Use sha256 rather than md5.
* Update .h dependencies for some objects.
* Provide `www' target to build everything needed for the web site.
* Move .SUFFIXES and .PHONY technicalities to the bottom.
* State Copyright and license, just for clarity.
Audit malloc(3)/calloc(3)/realloc(3) usage.
* Change eight reallocs to reallocarray to be safe from overflows.
* Change one malloc to reallocarray to be safe from overflows.
* Change one calloc to reallocarray, no zeroing needed.
* Change the order of arguments of three callocs (aesthetical).
Audit strlcpy(3)/strlcat(3) usage:
* Add missing truncation checks to three calls.
* In four cases where we know that the distination buffer is large enough,
cast the return vailue to (void).
* Repair three instances of silent truncation, use asprintf(3).
* Change two instances of strlen(3)+malloc(3)+strlcpy(3)+strlcat(3)+...
to use asprintf(3) instead to make them less error prone.
* Cast the return value of four instances where the destination
buffer is known to be large enough to (void).
* Completely remove three useless instances of strlcpy(3)/strlcat(3).
* Mark two places in -Thtml with XXX that can cause information loss
and crashes but are not easy to fix, requiring design changes of
some internal interfaces.
* The file mandocdb.c remains to be audited.
in debug messages, truncating strings of excessive lengths is actually
a good thing, so cast the return value from sprintf to (void);
this concludes the mandoc sprintf audit
fix unchecked snprintf(3) in page header printing:
the length of the title is unknown, and speed doesn't matter here,
so use asprintf/free rather than a static buffer
KNF: case (FOO): -> case FOO:, remove /* LINTED */ and /* ARGSUSED */,
remove trailing whitespace and blanks before tabs, improve some indenting;
no functional change
Two minor tweaks regarding the fallback from -u/-d to default mode:
(1) Use all files found on the command line, but do *not* use all stray
files found during fallback tree recursion.
(2) If the fallback works, call that success, i.e. exit(0).
As pointed out by naddy@, the latter is required for ports' happiness.
Properly handle symlinks (hardlinks and .so only files were already ok):
Use the file name of the symlink but the inode number of the file pointed to,
such that we get multiple mlinks records but not multiple mpages records.
Also make sure they do not point outside the tree we are processing.
Issue found by kili@ in desktop-file-edit(1), thanks!
In update mode, when opening the database fails, probably because it is
missing or corrupt, just rebuild it from scratch. This also helps when
installing the very first port on a freshly installed machine
and is similar to what espie@'s classical makewhatis(8) did.
Garbage collect one pair of needless parentheses in SQL code generation;
note this doesn't affect performance, SQLite generates the same byte code.
While here, make the calls to exprspec() easier to understand.
Give the mlinks and keys tables a pageid index,
as suggested by jeremy@ and espie@.
The mlinks index speeds up basic apropos(1) searches by around 30%
because it speeds up the final SELECT FROM mlinks query by about 95%.
For large result sets, the overall speedup gets even larger, in the
extreme case of "apropos Nd~." bymore than 90%.
The keys index finally makes the apropos(1) -O option usable: It no longer
incurs relevant extra cost, while in the past it was embarrassingly slow.
This comes at a cost: Total database build times grow by about 5%,
and each index adds about 10% database size with -Q. I consider that
acceptable in view of the huge apropos(1) performance gains.
The -Q database for /usr/share/man still remains below 1 MB.
Pass the function flags SQLITE_UTF8 (because SQLITE_ANY is deprecated)
and SQLITE_DETERMINISTIC when creating deterministic functions;
best practice measure suggested by espie@ and jeremy@;
as expected by jeremy@, no measurable effect on performance.
At the end of mansearch(), fchdir() back to where we started from;
this is cleaner and helps to not scatter gmon.out files all over
the place when profiling.
Using macros in .Sh header lines, or having .Sm off or .Bk -words open
while processing .Sh, is not at all recommended, but it's not strictly
a syntax violation either, and in any case, mandoc must not die in an
assertion. I broke this in rev. 1.124.
Crash found while trying to read the (rather broken) original 4.3BSD-Reno
od(1) manual page.
Unify description handling across all document types (mdoc, man, cat).
Assert that the description is unset right before calling the parse_*
handler, and assign a default if it's still unset right afterwards.
Remove all stray asserts and default assignments found elsewhere.
This fixes SQL_STEP failures for man(7) pages lacking descriptions.
Further apropos(1) speed optimization was trickier than anticipated.
Contrary to what i initially thought, almost all time is now spent
inside sqlite3(3) routines, and i found no easy way calling less of them.
However, sqlite(3) spends substantial time in malloc(3), and even more
(twice that) in its immediate malloc wrapper, sqlite3MemMalloc(),
keeping track of all individual malloc chunk sizes. Typically about
90% of the malloced memory is used for purposes of the pagecache.
By providing an mmap(3) MAP_ANON SQLITE_CONFIG_PAGECACHE, execution
time decreases by 20-25% for simple (Nd and/or Nm) queries, 10-20% for
non-NAME queries, and even apropos(1) resident memory size as reported
by top(1) decreases by 20% for simple and by 60% for non-NAME queries.
The new function, mansearch_setup(), spends no measurable time.
The pagesize chosen is optimal:
* Substantially smaller pages yield no gain at all.
* Larger pages provide no additional benefit and just waste memory.
The chosen number of pages in the cache is a compromise:
* For simple queries, a handful of pages would suffice to get the full
speed effect, at an apropos(1) resident memory size of about 2.0 MB.
* For non-NAME queries, a large pagecache with 2k pages (2.5 MB) might
gain a few more percent in speed, but at the expense of doubling the
apropos(1) resident memory size for *all* queries.
* The chosen number of 256 pages (330 kB) allows nearly full speed gain
for all queries at the price of a 15% resident memory size increase.
Next speed optimization step for the new apropos(1).
Split manual names out of the common "keys" table into their
own "names" table. This reduces standard apropos(1) search
times (i.e. searching for names and descriptions only) by
typically about 70% for the full /usr/share/man database.
(Yes, that multiplies with the previous optimization step,
so both together have reduced search times by a factor of
more than six. I'm not done yet, expect more to come.)
Even with the minimal databases built with makewhatis(8) -Q,
this step still reduces search times by 15-20%. For both cases,
database sizes and build times hardly change (+/-2%).
After careful gprof(1)ing of the new apropos(1), move the descriptions
back from the keys table to the mpages table: I found a good way
to still use them in searches, without complication of the code.
On my notebook, this reduces typical apropos(1) search times by about 40%,
it reduces /usr/share/man database size by 6% in makewhatis(8) -Q mode
and by 2% in standard mode (less overhead storing pointers to mpages),
and it doesn't measurably change database build times (may even be
going down by a percent or so because less data is being copied
around in ohashes).
Add a new term_flushln() flag TERMP_BRIND (if break, then indent)
to control indentation of continuation lines in TERMP_NOBREAK mode.
In the past, this was always on; continue using it
for .Bl, .Nm, .Fn, .Fo, and .HP, but no longer for .IP and .TP.
I looked at this because sthen@ reported the issue in a manual
of a Perl module from ports, but it affects base, too: This patch
reduces groff-mandoc differences in base by more than 15%.
If the SYNOPSIS section contains an excessively long .Nm,
adjust the right margin to avoid running into an assertion;
output in that case now agrees with groff, too.
Fully implement the \B (validate numerical expression) and
partially implement the \w (measure text width) escape sequence
in a way that makes them usable in numerical expressions and in
conditional requests, similar to how \n (interpolate number register)
and \* (expand user-defined string) are implemented.
This lets mandoc(1) handle the baroque low-level roff code
found at the beginning of the ggrep(1) manual.
Thanks to pascal@ for the report.
We already supported (outer) user-defined strings containing references
to other (inner) user-defined strings in their values, such that the inner
ones get expanded at expansion time of the outer ones (delayed evaluation).
Now we also support specifying the name of an (outer) user-defined
string to expand using the expanded values of some other (inner)
user-defined strings (indirect reference).