When checking cross references with -Tlint, ultimately fall back to
looking in the current working directory. Not a security issue
because the files are never open(2)ed, only access(2)ed.
Requested by jmc@ and inspired by mdoclint(1).
This cannot be perfect because it only works for files having the
exact filename ./pagename.sec - mandoc has no way to figure out
which files might contain a manual for multiple names, or that files
in autohell might be called ./pagename.man.in instead, or which
subdirectories might contain additional source files. Also, it may
hide messages if you have bogus stuff lying around in the directory
where you run mandoc -Tlint. But jmc@ considers it important, and
good enough for everyday use.
Also avoid leaking the memory for the file name while here.
When checking the validity of cross references with -Tlint,
fall back from database search to file system search
just like man(1) does when looking up manuals.
This is not too expensive because on a system having up-to-date
mandoc.db(5) files, it only prolongs the time needed to check
*invalid* references - and you are not supposed to have many of
those, right? And on a system with missing or invalid mandoc.db(5)
files, spending a bit of time and warning loudly about the real
problem is also better than quickly issuing bogus warnings about
cross references that are actually valid.
Basic reporting of .Xrs to manual pages that don't exist
in the base system, inspired by mdoclint(1).
We are able to do this because (1) the -mdoc parser, the -Tlint validator,
and the man(1) manual page lookup code are all in the same program
and (2) the mandoc.db(5) database format allows fast lookup.
Feedback from, previous versions tested by, and OK jmc@.
A few features will be added to this in the tree, step by step.
Ingo Schwarze [Thu, 29 Jun 2017 16:31:15 +0000 (16:31 +0000)]
Skip whitespace at the beginning of eqn(7) nodes,
in particular ~ and ^ that misrendered;
found by bentley@ in glCopyTexSubImage1D(3); also affected
glAccum(3), glClipPlane(3), glDrawPixels(3), glEvalMesh(3), and others.
Ingo Schwarze [Wed, 28 Jun 2017 00:59:57 +0000 (00:59 +0000)]
Rewrite half of this, i was completely unaware how bad it was.
Remove several lies, lots of duplicate information,
and a lengthy discussion of features we don't support.
Clarify the wording in some places and make it more concise in others.
Delete examples from where they don't belong
and write a new EXAMPLES section from scratch.
Ingo Schwarze [Tue, 27 Jun 2017 18:25:02 +0000 (18:25 +0000)]
Implement spacing of columns as defined in the table layout;
this is for example used by lftp(1)
and, ironically, misused by our very own tbl(7) manual...
Ingo Schwarze [Mon, 26 Jun 2017 20:09:04 +0000 (20:09 +0000)]
Complete rewrite of the lexer in a single function with four operation
modes instead of four functions, resulting in considerable
simplification, fifty lines less of code, fifteen fewer automatic
variables, and several bug fixes, for example:
1. The delim control statement consumes exactly two bytes of input,
requires no whitespace after these two bytes, and does not treat
quotes in any special way.
2. If the argument of left, right, gfont, gsize, or size is defined
as an alias, only the first word of the value is used as the
delimiter, font name, or font size.
3. If a back, fwd, down, or up keyword is followed by another keyword
instead of the required number, GNU eqn does nothing useful, but
typically errors out. So no need to have special handling (with
an ugly goto!) for it in mandoc.
Also getting rid of one pointless static buffer and twelve redundant
calls to strlcpy(3).
Ingo Schwarze [Sun, 25 Jun 2017 17:43:45 +0000 (17:43 +0000)]
Catch typos in .Sh names; suggested by jmc@.
I'm using a very simple, linear time / zero space fuzzy string
matching heuristic rather than a full Levenshtein metric, to keep
the code both simple and fast.
Ingo Schwarze [Sat, 24 Jun 2017 14:38:32 +0000 (14:38 +0000)]
Split -Wstyle into -Wstyle and the even lower -Wbase, and add
-Wopenbsd and -Wnetbsd to check conventions for the base system of
a specific operating system. Mark operating system specific messages
with "(OpenBSD)" at the end.
Please use just "-Tlint" to check base system manuals (defaulting
to -Wall, which is now -Wbase), but prefer "-Tlint -Wstyle" for the
manuals of portable software projects you maintain that are not
part of OpenBSD base, to avoid bogus recommendations about base
system conventions that do not apply.
Issue originally reported by semarie@, solution using
an idea from tedu@, discussed with jmc@ and jca@.
Ingo Schwarze [Sat, 24 Jun 2017 13:49:29 +0000 (13:49 +0000)]
Delete .St -p1003.1-2013.
It is an OpenBSD addition that did not get used a single time in
three years, and groff did not pick it up either, so removing it
does not affect any existing manuals anywhere.
Cleanup suggested by jmc@, OK bentley@.
Ingo Schwarze [Fri, 23 Jun 2017 23:00:01 +0000 (23:00 +0000)]
Consistently treat character escape sequences as operators,
not as letters, even if their names contain letters.
This is certainly not perfect, but code to recognize that \(*a is
not an operator but a letter would need a huge table, or Unicode
character property support, which won't happen at this time.
Ingo Schwarze [Fri, 23 Jun 2017 02:32:12 +0000 (02:32 +0000)]
Write text boxes as <mi>, <mn>, or <mo> as appropriate,
and write fontstyle or fontweight attributes where required.
Missing features reported by bentley@.
Ingo Schwarze [Fri, 23 Jun 2017 00:30:38 +0000 (00:30 +0000)]
Simplify font handling:
1. Inherit the font attribute from the parent box, such that iteration
is no longer required to find the current font.
2. For well-known function name tokens, do not insert an EQN_LISTONE
box into the AST; simply set the font attribute of the text box
itself that contains the name.
Also improve word splitting of unquoted strings in default font mode:
3. Split between numbers and punctuation because both will soon get
different HTML markup.
4. Do not split between letters. With the newly ubiquitious font
attributes, all formatters will be able to figure out what to do
without putting each letter into a separate box.
Ingo Schwarze [Thu, 22 Jun 2017 00:30:20 +0000 (00:30 +0000)]
Fix font selection for text boxes in the terminal formatter.
Issue reported by bentley@.
The AST data structure is powerful enough that all required
information can easily be provided in the parser, and no change
of the formatting code is needed.
Ingo Schwarze [Wed, 21 Jun 2017 20:50:50 +0000 (20:50 +0000)]
Outside explicit font context, give every letter its own box.
The formatters need this to correctly select fonts.
Missing feature reported by bentley@.
Ingo Schwarze [Wed, 21 Jun 2017 18:04:34 +0000 (18:04 +0000)]
Recognize well-known functions names (the same that Heirloom recognizes,
which includes those recognized by groff) and wrap them in a roman box
unless they already are in roman context.
Missing feature reported by bentley@.
Ingo Schwarze [Sun, 18 Jun 2017 17:36:03 +0000 (17:36 +0000)]
Implement appending to standard man(7) and mdoc(7) macros with .am.
With roff_getstrn(), provide finer control which definitions
can be used for what:
* All definitions can be used for .if d tests and .am appending.
* User-defined for \* expansion, .dei expansion, and macro calling.
* Predefined for \* expansion.
* Standard macros, original or renamed, for macro calling.
Several related improvements while here:
* Do not return string table entries that have explicitly been removed.
* Do not create a rentab entry when trying to rename a non-existent macro.
* Clear an existing rentab entry when the external interface
roff_setstr() is called with its name.
* Avoid trailing blanks in macro lines generated from renamed
and from aliased macros.
* Delete the duplicate __m*_reserved[] tables, just use roff_name[].
Ingo Schwarze [Fri, 16 Jun 2017 20:01:06 +0000 (20:01 +0000)]
Multiple tbl(7) improvements:
* Do not discard data that lacks a matching layout cell but remains
within the number of columns of the table as a whole.
* Do not insert dummy data rows for any layout row starting with a
horizontal line, but only for layout rows that would discard all
the data on a matching non-empty data row.
* Print horizontal lines specified in the layout even if there is
no matching data cell.
* Improve the logic for extending vertical lines to adjacent rows,
for choosing cross marks versus line segments, and some related details.
Ingo Schwarze [Wed, 14 Jun 2017 22:51:25 +0000 (22:51 +0000)]
Naive implementation of the roff(7) .po (page offset) request.
This clearly works when .po is called on the top level, but might
not be sophisticated enough if people call .po inside indentation-changing
contexts, but i haven't seen that in manual pages (yet :).
Ingo Schwarze [Tue, 13 Jun 2017 19:34:40 +0000 (19:34 +0000)]
Partial support for the \n[an-margin] number register.
Manuals autogenerated from reStructuredText are reckless enough
to peek at this non-portable, implementation-dependent, highly
groff-specific internal register - for no good reason, because the
man(7) language natively provides in a much simpler way what they
are trying to emulate here with much fragility.
A full implementation would be very hard because it would require
access to output-device-specific formatting data at the roff(7)
preprocessor stage, which mandoc doesn't support at all.
So hardcode a few magic numbers as reStructuredText expects them
for terminal output. For other output modes (like HTML), code using
this register is utterly broken anyway.
Ingo Schwarze [Tue, 13 Jun 2017 16:12:01 +0000 (16:12 +0000)]
If the layout is empty except for requesting a left vertical frame,
record that detail in struct tbl_opts, such that term_tbl() can do
correct column calculations and doesn't prematurely break lines.
Fixes the tbl/layout/empty regression test that got broken when
line breaking in text block cells was implemented.
Ingo Schwarze [Tue, 13 Jun 2017 15:06:56 +0000 (15:06 +0000)]
Delete the arbitrary range restriction for -Owidth.
We provide users with tools. We don't attempt to prevent them from
using them in stupid ways: depending on the context, not every
stupid-looking use is necessarily actually stupid, and not every
stupidity can be automatically detected anyway, so don't even try.
Ingo Schwarze [Tue, 13 Jun 2017 13:51:11 +0000 (13:51 +0000)]
Explicitly ignore .br, .ce, and .sp inside tbl(7) text blocks.
With the current code structure, they would appear at the wrong
place in the syntax tree, so it is better to not insert them
into the tree at all and issue an UNSUPP message instead.
Ingo Schwarze [Mon, 12 Jun 2017 22:49:16 +0000 (22:49 +0000)]
Two minor fixes for the "allbox" modifier:
1. It does not reduce explicit "||" in the layout to "|".
2. It does not cause three horizontal lines at the end of a table,
even if the table ends with an explicit "_" data line.
Ingo Schwarze [Mon, 12 Jun 2017 22:05:57 +0000 (22:05 +0000)]
If a tbl(7) layout contains a 'w' (minimum width) modifier for a
given column, that column contains no literal or numeric cell of
larger width, and all text block cells in that column can be line
wrapped to fit into that minimum width, groff does not increase
that column width beyond the specified minimum: so do the same.
Ingo Schwarze [Sun, 11 Jun 2017 19:45:05 +0000 (19:45 +0000)]
Style message about legacy man(7) date format in mdoc(7) documents
and operating system dependent messages about missing or unexpected
Mdocdate; inspired by mdoclint(1).
Ingo Schwarze [Sun, 11 Jun 2017 19:37:00 +0000 (19:37 +0000)]
Style message about legacy man(7) date format in mdoc(7) documents
and operating system dependent messages about missing or unexpected
Mdocdate; inspired by mdoclint(1).
Ingo Schwarze [Sun, 11 Jun 2017 14:24:55 +0000 (14:24 +0000)]
Do not issue the message "no blank before trailing delimiter" for .No.
In practice, that message only matters inside .Bf, and even there, it
can occasionally be a false positive. In all other cases, it usually
is a false positive, so it is better to drop it outright.
Suggested by jmc@.
Ingo Schwarze [Thu, 8 Jun 2017 18:11:22 +0000 (18:11 +0000)]
Implement w layout specifier (minimum column width).
Improve width calculation of text blocks.
Reduces the groff/mandoc diff in Base+Xenocara by about 800 lines.
Ingo Schwarze [Wed, 7 Jun 2017 20:58:49 +0000 (20:58 +0000)]
Also catch "new sentence, new line" if there are three blanks
between the sentences. Thomas Klausner says he has seen some
of these, and i don't see any false positives.
Ingo Schwarze [Wed, 7 Jun 2017 20:30:40 +0000 (20:30 +0000)]
Make "new sentence, new line" detection stricter:
Also catch cases where the new sentence starts with a one-letter word
and the input line is broken right after that word.
Suggested by Thomas Klausner <wiz @ NetBSD>.
It's merely a three-bit diff, changing one byte from 0x34 to 0x33,
so what can possibly go wrong...
Ingo Schwarze [Wed, 7 Jun 2017 20:01:19 +0000 (20:01 +0000)]
Prepare the terminal driver for filling multiple columns in parallel,
second step: make the per-column byte pointer persistent across
term_flushln() calls, such that a subsequent call can continue at
the point where the previous call left. If more than one column
is in use, return from term_flushln() when the column is full,
rather than breaking the output line.
No functional change, because nothing sets up multiple columns yet.
Ingo Schwarze [Wed, 7 Jun 2017 17:38:26 +0000 (17:38 +0000)]
Prepare the terminal driver for filling multiple columns in parallel,
first step: split column data out of the terminal state struct into
a new column state struct and use an array of such column state
structs. No functional change.
Ingo Schwarze [Wed, 7 Jun 2017 02:14:09 +0000 (02:14 +0000)]
The \h escape sequence provides another method for moving backwards,
and after that, previously written output gets overwritten, but
overwriting with blanks does *not* erase previously written content.
Yes, manual pages exist that are crazy enough to rely on that...
Ingo Schwarze [Wed, 7 Jun 2017 00:50:34 +0000 (00:50 +0000)]
Implement the roff(7) .rn (rename macro or string) request.
Renaming a user-defined macro is very simple: just copy
the definition to the new name and delete the old name.
Renaming high-level macros is a bit tricky: use a dedicated
key-value-table, with non-standard names as keys and standard
names as values. When a macro is found that is not user-defined,
look it up in the "renamed" table and translate it back to the
standard name before passing it on to the high-level parsers.
Ingo Schwarze [Tue, 6 Jun 2017 15:01:04 +0000 (15:01 +0000)]
Minimal implementation of the roff(7) .ce request (center a number
of input lines without filling).
Contrary to groff, high-level macros abort .ce mode for now.
Ingo Schwarze [Sun, 4 Jun 2017 22:44:15 +0000 (22:44 +0000)]
Implement the roff(7) .mc (right margin character) request.
The Tcl/Tk manual pages use this extensively.
Delete the TERM_MAXMARGIN hack, it breaks .mc inside .nf;
instead, implement a proper TERMP_BRNEVER flag.
Ingo Schwarze [Sun, 4 Jun 2017 18:50:35 +0000 (18:50 +0000)]
Make term_flushln() simpler and more robust:
Eliminate the "overstep" state variable.
The information is already contained in "viscol".
Minus 60 lines of code, no functional change intended.
Ingo Schwarze [Sun, 4 Jun 2017 00:13:15 +0000 (00:13 +0000)]
Pure preprocessor implementation of the roff(7) .ec and .eo requests
(escape character control), touching nothing after the preprocessing
stage and keeping even the state variable local to the preprocessor.
Since the escape character is also used for line continuation, this
requires pulling the implementation of line continuation from the
input reader to the preprocessor, which also considerably shortens
the code required for that.
When the escape character is changed, simply let the preprocessor
replace bare by escaped backslashes and instances of the non-standard
escape character with bare backslashes - that's all we need.
Oh, and if anybody dares to use these requests in OpenBSD manuals,
sending a medium-sized pack of axe-murderers after them might be a
worthwhile part of the punishment, but probably insuffient on its own.
Ingo Schwarze [Fri, 2 Jun 2017 19:21:23 +0000 (19:21 +0000)]
Partial implementation of \h (horizontal line drawing function).
A full implementation would require access to output device properties
and state variables (both only available after the main parser has
finalized the parse tree) before numerical expansions in the roff
preprocessor (i.e., before the main parser is even started).
Not trying to pull that stunt right now because the static-width
implementation committed here is sufficient for tcl-style manual pages
and already more complicated than i would have suspected.