Ingo Schwarze [Wed, 19 Mar 2014 21:51:20 +0000 (21:51 +0000)]
Generalize the mparse_alloc() and roff_alloc() functions by giving
them an "options" argument, replacing the existing "inttype" and
"quick" arguments, preparing for a future MPARSE_SO option.
Store this argument in struct mparse and struct roff, replacing the
existing "inttype", "parsetype", and "quick" members.
No functional change except one tiny cosmetic fix in roff_TH().
Ingo Schwarze [Tue, 18 Mar 2014 16:56:10 +0000 (16:56 +0000)]
Allow checking that databases are up to date even when you have no write
permission on the databases, as requested by espie@ quite some time ago.
But make sure to not slow database generation down when you do have write
permission, and to not delay error reporting in -Q mode.
Ingo Schwarze [Mon, 17 Mar 2014 09:43:56 +0000 (09:43 +0000)]
Sync to OpenBSD:
* do not talk about shell globbing
* describe logical operations
* improve examples
* add HISTORY
* some wording improvements for clarity
Ingo Schwarze [Thu, 13 Mar 2014 19:23:50 +0000 (19:23 +0000)]
In -Tutf8 mode, make sure that hyphens get counted against the output line
length even when they are breakable. Before this, a line containing N
breakable hyphens could get up to N characters wider than the right margin
in -Tutf8 output mode.
Issue reported by tedu@ on <bugs at OpenBSD>.
Ingo Schwarze [Sat, 8 Mar 2014 16:22:04 +0000 (16:22 +0000)]
In .nf mode, use the MAN_LINE flag to detect input line breaks
instead of the man_node line member. This is required to preserve
line breaks contained in user-defined macros called in .nf mode.
Found in a code audit triggered by fixing a similar issue in .TP.
Ingo Schwarze [Sat, 8 Mar 2014 15:50:41 +0000 (15:50 +0000)]
To find out whether .TP head arguments are same-line or next-line arguments,
use the MAN_LINE flag instead of the man_node line member.
This is required such that user-defined macros wrapping .TP work correctly.
Issue found by Havard Eidnes in Tcl_NewStringObj(3), reported via
the NetBSD bug tracking system and Thomas Klausner <wiz at NetBSD>.
Ingo Schwarze [Sat, 8 Mar 2014 04:43:54 +0000 (04:43 +0000)]
Improve .if/.ie condition handling.
* Support string comparisons.
* Support negation not only for numerical, but for all conditions.
* Switch the `o' condition from false to true.
* Handle the `c', `d', and `r' conditions as false for now.
* Use int for boolean data instead of rolling our own "enum roffrule";
needed such that we can use the standard ! and == operators.
Havard Eidnes reported via the NetBSD bug tracking system that some
Tcl*(3) manuals need this, and Thomas Klausner <wiz at NetBSD>
forwarded the report to me. This doesn't make the crazy Tcl*(3)
macrology maze happy yet, but brings us a bit closer.
Ingo Schwarze [Fri, 7 Mar 2014 18:37:37 +0000 (18:37 +0000)]
In roff_cond_sub(), make sure that the incorrect input sequence `\\}',
when found on a macro line, does not close a conditional block.
The companion function roff_cond_text() already did this correctly,
but make the code more readable without functional change.
While here, report the correct column number in related error messages.
Ingo Schwarze [Fri, 7 Mar 2014 02:22:05 +0000 (02:22 +0000)]
Three bugfixes related to the closing of conditional blocks:
1. Handle more than one `\}' on macro lines, as it was already done
for text lines.
2. Do not treat `\}' as a macro invocation after a dot at the beginning
of a line. That allows more than one `\}' to work on lines starting
with `.\}'. It also simplifies the code.
3. Do not complain about characters following `\}'. Those are not lost,
but handled normally both on text and macro lines.
Ingo Schwarze [Wed, 5 Mar 2014 23:14:46 +0000 (23:14 +0000)]
In -Tutf8 mode, mandoc_char(7) named accent character escape sequences
have to render as non-combining accents; if you want combining accents,
you have to explicitly specify them using the Unicode character numbers
for combining accents, or you can use character escape sequences for
accented characters. This lets mandoc behave like groff.
Additionally, both the Ossanna/Kernighan/Ritter troff manual and
the GNU troff manual say that \' and \` are equivalent to \(aa and
\(ga, respectively, so do the same for these. This mitigates issues
with man(7) code autogenerated by texinfo2man(1), which mistranslates
TeX ` and ' to \` and \' instead of \(oq and \(cq as reported by
sthen@ and as analyzed by bentley@.
Ingo Schwarze [Mon, 3 Mar 2014 18:53:27 +0000 (18:53 +0000)]
- remove index.html, it is now part of the website repo
- install mandocdb, manpage, and apropos
- and some general cleanup (e.g., installcgi is .PHONY)
Ingo Schwarze [Mon, 3 Mar 2014 17:08:26 +0000 (17:08 +0000)]
Move the regression suite to the attic.
It has not been used or maintained for several years,
and we won't start using it now.
Devlopment regression testing is done in OpenBSD, and
there is no value in maintaining two regression suites in parallel.
Ingo Schwarze [Sun, 16 Feb 2014 14:26:55 +0000 (14:26 +0000)]
After Werner Lemberg accepted and committed some updates to the manual
page template contained in groff_mdoc(7), catch up with our own stuff.
In particular, allow ERRORS in section 4 and DIAGNOSTICS in section 9.
ok jmc@
Ingo Schwarze [Fri, 14 Feb 2014 23:24:26 +0000 (23:24 +0000)]
Parse and ignore the roff(7) .ce request (center some lines).
We even parse and ignore the .ad request (adjustment mode),
and it doesn't make sense to more prominently warn about
temporary than about permanent adjustment changes.
Request found by naddy@ in xloadimage(1) and by juanfra@ in racket(1).
Ingo Schwarze [Fri, 14 Feb 2014 23:05:20 +0000 (23:05 +0000)]
Implement the roff(7) .as request (append to user-defined string).
Missing feature found by jca@ in ratpoison(1).
The ratpoison(1) manual still doesn't work because it uses .shift
and .while, too (apparently, ratpoison is so complex that it
needs a Turing-complete language to even format its manual :-).
Ingo Schwarze [Fri, 14 Feb 2014 22:27:41 +0000 (22:27 +0000)]
Handle some predefined read-only number registers, e.g. .H and .V.
In particular, this improves handling of the pod2man(1) preamble;
for examples of the effect, see some author names in perlthrtut(1).
Missing feature reported by Andreas Voegele <mail at andreasvoegele dot com>
more than two years ago. Written at Christchurch International Airport.
Ingo Schwarze [Fri, 24 Jan 2014 22:54:33 +0000 (22:54 +0000)]
Supplement the documentation of the .St macro by minimal commentary
regarding the content and relationships of the various standards,
and sort and group them.
tweaks and ok guenther@, ok millert@ sobrado@ jmc@
Ingo Schwarze [Sun, 19 Jan 2014 00:09:38 +0000 (00:09 +0000)]
Support a second -v on mandocdb(8) to show keys while they are being added;
i need that for debugging, in particular to be used with -t.
To be able to do so, provide a global table of key names, for reuse.
Ingo Schwarze [Sat, 18 Jan 2014 08:23:55 +0000 (08:23 +0000)]
Sort the macro keys by their real-world frequency to reduce the average
mask size. No functional change.
This shrinks the standard /usr/share/man database by 7%, now at 10.3x
the size of whatis.db, and with -Q even by 11%, now at 3.0x of whatis.db.
Now i'm out of ideas to easily shrink the size of the database.
Ingo Schwarze [Sat, 18 Jan 2014 08:21:03 +0000 (08:21 +0000)]
Drop the AUTOINCREMENT PRIMARY KEYs from the mlinks and keys tables.
They are completely unused, and i cannot imagine what they *could*
ever be used for; but apparently, they are expensive to generate.
Standard DB build time goes down by 10%, now at 1.9x of makewhatis.
Standard DB size goes down by 4%, now at 11x of makewhatis.
DB build time with -Q goes down by 15%, now at 0.28x of makewhatis.
DB size with -Q goes down by 3%, now at 3.35x of makewhatis.
Ingo Schwarze [Sat, 18 Jan 2014 08:19:18 +0000 (08:19 +0000)]
Despite some experimenting, i'm unable to find any relevant effect of
creating an index for the keys table on apropos(1) search times;
apparently, adding that index was premature optimization in the first
place; so, stop adding that index.
Its root gone, the following evil is reduced (/usr/share/man on my notebook)
- DB build time with -Q goes down by 15%, now at 1/3 of makewhatis
- DB size with -Q goes down by 35%, now at 3.5x of makewhatis
- full DB build time goes down by 12%, now at 2.1x of makewhatis
- full DB size goes down by 42%, now at 11.5x of makewhatis
Ingo Schwarze [Tue, 7 Jan 2014 09:10:45 +0000 (09:10 +0000)]
Cache the result of uname(3) such that we don't need to call it
over and over again for each manual; found with gprof(1).
Speeds up mandocdb(8) -Q by 3%, now at 39.5% of makewhatis(8).
Ingo Schwarze [Mon, 6 Jan 2014 23:46:07 +0000 (23:46 +0000)]
Gprof(1) is fun. You should use it more often.
Another 10% speedup for mandocdb(8) -Q, and even 3% without -Q.
With -Q, we are now at 41% of the time required by makewhatis(8).
Do not copy predefined strings into the dynamic string table, just
leave them in their own static table and use that one as a fallback
at lookup time. This saves us copying and deleting them for each manual.
No functional change.
Ingo Schwarze [Mon, 6 Jan 2014 22:39:25 +0000 (22:39 +0000)]
Another 18% speedup for mandocdb(8) -Q, found by gprof(1).
In -Q mode, refrain form validating and normalizing the format
of the date given in .Dd or .TH, as it won't be used anyway.
For /usr/share/man, mandocdb -Q now takes 45% of the time of makewhatis(8).
Ingo Schwarze [Mon, 6 Jan 2014 21:34:31 +0000 (21:34 +0000)]
Another 25% speedup for mandocdb(8) -Q mode, found with gprof(1).
For /usr/share/man, we only need 56% of the time of makewhatis(8) now.
In groff, user-defined macros clashing with mdoc(7) or man(7)
standard macros are cleared when parsing the .Dd or .TH macro,
respectively. Of course, we continue doing that in standard mode
to assure full groff bug compatibility.
However, in -Q mode, full groff bug compatibility makes no sense
when it's unreasonably expensive, so skip this step in -Q mode.
Real-world manuals hardly ever redefine standard macros,
that's terrible style, and if they do, it's pointless to do so
before .Dd or .TH because it has no effect. Even if someone does,
it's extremely unlikely to break mandocdb(8) -Q parsing because we
abort the parse sequence after the NAME section, anyway.
So if you manually redefine .Sh, .Nm, .Nd, or .SH in a way that doesn't
work at all and rely on .Dd or .TH to fix it up for you, your broken
manual will no longer get a perfect apropos(1) entry until you re-run
mandocdb(8) without -Q. It think that consequence is acceptable
in order to get a 25% speedup for everyone else.
Ingo Schwarze [Mon, 6 Jan 2014 20:53:40 +0000 (20:53 +0000)]
Do not sync to disk after each individual manual page (duh!),
only sync to disk one single time when all data is ready.
Rebuild times for /usr/share/man/mandoc.db shrink on my notebook:
In standard mode from 45 seconds to 11 seconds (75% reduction)
In -Q mode from 25 seconds to 3.1 seconds (87% reduction)
For comparison: makewhatis(8): 4.2 seconds
That is, in -Q mode, we are now *faster* than the existing makewhatis(8),
and careful profiling shows there is still a lot of room for improval.
Ingo Schwarze [Mon, 6 Jan 2014 03:52:13 +0000 (03:52 +0000)]
Remove the redundant "file" column from the "mlinks" table.
The contents can easily be reconstructed from sec, arch, name, form.
Shrinks the database by another 3% in standard mode and 9% in -Q mode.
Ingo Schwarze [Mon, 6 Jan 2014 03:02:46 +0000 (03:02 +0000)]
Drop Nd from the mpages table, it is still in the keys table.
This shrinks the database in standard mode by 3%, in -Q mode by 9%,
without loss of functionality.
Ingo Schwarze [Mon, 6 Jan 2014 00:53:33 +0000 (00:53 +0000)]
Joerg Sonnenberger contributed copyrightable amounts of text to
some files. To make it clear that he also put his contributions
under the ISC license, with his explicit permission, add his
Copyright notice to the relevant files. No code change.
Ingo Schwarze [Sun, 5 Jan 2014 20:26:36 +0000 (20:26 +0000)]
Add an option -Q (quick) to mandocdb(8)
for accelerated generation of reduced-size databases.
Implement this by allowing the parsers to optionally
abort the parse sequence after the NAME section.
While here, garbage collect the unused void *arg attribute of
struct mparse and mparse_alloc() and fix some errors in mandoc(3).
This reduces the processing time of mandocdb(8) on /usr/share/man
by a factor of 2 and the database size by a factor of 4.
However, it still takes 5 times the time and 6 times the space
of makewhatis(8), so more work is clearly needed.
Tag functions with format strings as arguments as printf-like.
Fix one case where a non-literal is used as format string.
Fix another case where a variable is formatted using the wrong type.
Ingo Schwarze [Sun, 5 Jan 2014 04:48:40 +0000 (04:48 +0000)]
Rip out the complete "reachable" checks, without replacement.
It's a pity i spent time during t2k13 writing this; however,
when an entire concept is busted, let us not look back,
There is no such thing as an unreachable page. Even if you are crazy
enough to put a page starting with ".Dt NAMEI 9" into a file man1/cat.1,
we now make sure that it can be found by all of the following:
Nm=namei Nm=cat sec=1 sec=9
It will always be displayed as:
cat(1) - pathname lookup
So you know that you have to type `man cat` to get at it.
That obsoletes the concept of "unreachable manuals" for good.
Ingo Schwarze [Sun, 5 Jan 2014 04:13:52 +0000 (04:13 +0000)]
Remove the obsolete file name column from the mpages table.
This column wasn't helpful because one manpage can have multiple MLINKS.
Use the file name column in the mlinks table, instead.
Ingo Schwarze [Sun, 5 Jan 2014 03:25:51 +0000 (03:25 +0000)]
Remove the obsolete sec and arch columns from the mpages table.
They were confusing because a manpage can have MLINKS in different
sections and architectures.
Ingo Schwarze [Sun, 5 Jan 2014 03:06:43 +0000 (03:06 +0000)]
Reimplement apropos -s NUM -S ARCH EXPR by internally converting it to
apropos \( EXPR \) -a 'sec~^NUM$' -a 'arch~^(ARCH|any)$'
in preparation for removal of sec and arch from the mpage table.
Almost no functional change except for the following bonus:
This also makes sure that for cross-section and cross-arch MLINKs,
all of the following work:
apropos -s 1 encrypt
apropos -s 8 encrypt
apropos -s 1 makekey
apropos -s 8 makekey
While here, print error messages about invalid regexps to stderr.
Ingo Schwarze [Sun, 5 Jan 2014 00:29:54 +0000 (00:29 +0000)]
Put section and architecture info into the keys table,
in preparation for removing them from the mpages table,
aiming for cleaner and more uniform interfaces.
Database growth is below 4%, part of which will be reclaimed.
As a bonus, this allows searches like:
./obj/apropos An=kettenis -a arch=ppc
./obj/apropos An=kettenis -a sec~[^4]
Ingo Schwarze [Sat, 4 Jan 2014 23:43:53 +0000 (23:43 +0000)]
New implementation of complex search criteria using \(, \), -a because
the old implementation got lost in the Berkeley to SQLite switch.
Note that this is not just feature creep, but required for upcoming
database format cleanup and simplification.
Ingo Schwarze [Sat, 4 Jan 2014 13:40:01 +0000 (13:40 +0000)]
Even though strnlen(3) is required by POSIX 2008,
Matthias Scheler reports than Solaris 10 lacks it.
While here, sort the declarations in config.h
and move the headers to the top.
Ingo Schwarze [Sat, 4 Jan 2014 01:11:00 +0000 (01:11 +0000)]
Clean up feature tests:
* Split the configure steering script out of the Makefile.
* Let the configure step depend on the test sources.
* Clean up the test programs such that they can be run.
Ingo Schwarze [Thu, 2 Jan 2014 22:44:10 +0000 (22:44 +0000)]
Avoid "utf8" in the names of a function and a struct member
that don't necessarily have anything to do with UTF-8.
Just renaming, no functional change.
Ingo Schwarze [Thu, 2 Jan 2014 18:52:15 +0000 (18:52 +0000)]
Check all MLINKS for consistency with the content of the manual page,
not just the first one. This doesn't change how the check is done,
but just which MLINKS are checked.
Ingo Schwarze [Thu, 2 Jan 2014 16:29:55 +0000 (16:29 +0000)]
Since the functions in read.c are part of the mandoc(3) library,
do not print to stderr. Instead, properly use the mmsg callback.
Issue noticed by Abhinav Upadhyay <er dot abhinav dot upadhyay
at gmail dot com> and Thomas Klausner <wiz at NetBSD>.
Ingo Schwarze [Tue, 31 Dec 2013 23:29:41 +0000 (23:29 +0000)]
Support .St -p1003.1-2013, "IEEE Std 1003.1-2008/Cor 1-2013".
Note that the POSIX-2008 standard remains in force, so please refrain
from wholesale 2008 -> 2013 replacements. Make sure to only use the
new -p1003.1-2013 argument for cases where "IEEE Std 1003.1(TM)-2008/
Cor 1-2013, IEEE Standard for Information Technology--Portable
Operating System Interface (POSIX(R)), Technical Corrigendum 1"
actually changes something in the standard with respect to the
specific function documented in the manual you touch. Otherwise,
please continue using .St -p1003.1-2008.
Triggered by a similar, but slightly incorrect patch from jmc@;
ok guenther@.
Ingo Schwarze [Tue, 31 Dec 2013 22:40:12 +0000 (22:40 +0000)]
Do not trigger end-of-sentence spacing by trailing punctuation
at the end of partial implicit macros. Prodded by jmc@.
Actually, this is a revert of rev. 1.64 Fri May 14 14:09:13 2010 UTC
by kristaps@, with this original commit message:
"Block-implicit macros now up-propogate end-of-sentence spacing.
NOTE: GROFF IS NOT SMART ENOUGH TO DO THIS."
Please speak after me: Then why the hell should we?
We already weakened this in rev. 1.93 Sun Jul 18 17:00:26 2010 UTC,
but that weakening was insufficient. Let's take it out completely.
Admittedly, there are two places in OpenBSD base where what Kristaps
did make the output nicer, in calloc(3) and in fish(6). But both are
atypical. There are 18 other places where this revert makes the
output nicer, the typical case being:
"Mail status is shown as ``No Mail.'' if there is no mail."
You do *not* want the EOS spacing after ``No Mail.'' in that sentence.
Ingo Schwarze [Tue, 31 Dec 2013 19:40:20 +0000 (19:40 +0000)]
Yet another regression introduced by Kristaps when he switched from
Berkeley DB to SQLite3: In the .In parser, the logic got inverted.
The resulting NULL pointer access was found by clang;
scan log provided by Ulrich Spoerlein <uqs at FreeBSD>.
The best fix is to simply remove the whole, pointless custom
handler function for .In and let the framework do its work.
Now searching for included header files actually works.
While here, remove the similarly pointless custom .St handler,
fix the return value of the .Fd handler and disentangle the
spaghetti in the .Nm handler.
Ingo Schwarze [Tue, 31 Dec 2013 18:07:42 +0000 (18:07 +0000)]
remove assignments that will be overwritten right afterwards,
and remove pointless local variables;
found in a clang output from Ulrich Spoerlein <uqs at FreeBSD>
Ingo Schwarze [Tue, 31 Dec 2013 03:41:14 +0000 (03:41 +0000)]
Experimental feature to let apropos(1) show different keys than .Nd.
This really takes us beyond what grep -R /usr/*/man/ can do
because now you can search for pages by *one* criterion and then
display the contents of *another* macro from those pages, like in
$ apropos -O Ox Fa~wchar
to get an impression how long wide character handling is available.
Ingo Schwarze [Mon, 30 Dec 2013 18:44:06 +0000 (18:44 +0000)]
Oops, missed one:
Remove duplicate const specifier from a call to mandoc_escape().
Found by Thomas Klausner <wiz at NetBSD dot org> using clang.
No functional change.
Ingo Schwarze [Mon, 30 Dec 2013 18:30:32 +0000 (18:30 +0000)]
Remove duplicate const specifiers from the declaration of mandoc_escape().
Found by Thomas Klausner <wiz at NetBSD dot org> using clang.
No functional change.
Ingo Schwarze [Fri, 27 Dec 2013 20:35:51 +0000 (20:35 +0000)]
Split mlinks_undupe() out of mpages_merge()
such that the check for source manuals of the same name
can be done for multiple mlinks pointing to the same preformatted mpage.
Ingo Schwarze [Fri, 27 Dec 2013 18:51:25 +0000 (18:51 +0000)]
Change the mansearch() interface to use the mlinks table in the database
and return a list of names with sections, used by apropos(1) for display.
While here, improve uniformity of the interface by allocating the file
name dynamically, just like the names list and the description.
Ingo Schwarze [Fri, 27 Dec 2013 16:17:32 +0000 (16:17 +0000)]
Allow saving more than one mlink per mpage in the mlinks ohash.
We are still only using one of them for now.
Actually, we are now using a different one,
but the order the mlinks are found is random anyway.
Ingo Schwarze [Fri, 27 Dec 2013 14:29:28 +0000 (14:29 +0000)]
Another step on the way to clear naming, this time regarding mlinks:
* rename global ohash filenames to mlinks
* rename ofadd() to mlink_add()
* fold fileadd() and inoadd() into mlink_add()
* fold filecheck() into mpages_merge()
Still no functional change.
Ingo Schwarze [Fri, 27 Dec 2013 01:16:54 +0000 (01:16 +0000)]
Add an additional mlinks table to the database, redundant for now,
both because it contains nothing but a subset of the data of the
existing mpages table and because the relationship of mpage and mlink
entries is still 1:1. But all that will eventually change.
Ingo Schwarze [Thu, 26 Dec 2013 23:35:59 +0000 (23:35 +0000)]
Drop the mpages_list, use the existing mpages ohash for iteration.
No functional change except that the order of database entries changes,
which doesn't matter anyway.
Ingo Schwarze [Thu, 26 Dec 2013 22:12:46 +0000 (22:12 +0000)]
To better support MLINKS, we will have to split the "docs" database
table into two tables, on for actual files on disk, one for (often
multiple) directory entries pointing to them. That implies splitting
struct of into two structs, to be called "mpage" and "mlink",
respectively. As a preparation, globally rename "of" and "inos"
to "mpage". No functional change.
Ingo Schwarze [Thu, 26 Dec 2013 17:23:42 +0000 (17:23 +0000)]
Rework the documentation of Spaces, using the Ossanna/Kernighan/Ritter
Heirloom Nroff/Troff User's Manual at the authoritative reference.
Part of our text was outright wrong.
Also, refrain from advertising the paddable non-breaking space `\~'
in the DESCRIPTION, for three reasons: For nroff mode, -Tascii, and
fixed width fonts in general, it makes no difference, so keep the
discussion simple. Compared to `\ ', '\~' is of questionable portability.
And if you want to keep words together, it is also more usual that you
don't want padding to intervene either.
Finally, drop the `\c' escape sequence (interrupt text processing)
which is not a special character but an input processing instruction
akin to the \<newline> escape sequence.