Ingo Schwarze [Thu, 13 Dec 2018 03:40:13 +0000 (03:40 +0000)]
Cleanup, no functional change:
In libroff.h, nothing was left except the eqn(7) parser interface, which
isn't really part of the roff(7) parser, so rename it to eqn_parse.h.
While here, move struct eqn_def to eqn.c because that's the only
file using it, and let eqn_box_free() and eqn_free() handle NULL.
Ingo Schwarze [Wed, 12 Dec 2018 21:54:35 +0000 (21:54 +0000)]
Cleanup, no functional change:
No need to expose the tbl(7) syntax tree data structures everywhere.
Move them to their own include file, "tbl.h", and improve comments.
Ingo Schwarze [Tue, 4 Dec 2018 05:21:04 +0000 (05:21 +0000)]
Make sure all borders in a table are drawn in the same color.
Required because browsers tend to have inconsistent defaults:
For example, Firefox 62.0.2 sets border-color for tbody, but not for table,
and Pali Rohar reports that Chrome set it for td, but not for tr or tbody.
The td part is from Pali Rohar, the tbody and tr parts from me.
Ingo Schwarze [Tue, 4 Dec 2018 03:28:58 +0000 (03:28 +0000)]
During validation, drop .br before a text line starting with a
blank, rather than teaching each formatter individually to ignore
the .br in such situations. That's simpler and also results in
better diagnostics.
Mark Harris <mark dot hsj at gmail dot com> reported
that -T html got confused in particular.
Ingo Schwarze [Tue, 4 Dec 2018 02:53:51 +0000 (02:53 +0000)]
Clean up the validation of .Pp, .PP, .sp, and .br. Make sure all
combinations are handled, and are handled in a systematic manner.
This resolves some erratic duplicate handling, handles a number of
missing cases, and improves diagnostics in various respects.
Move validation of .br and .sp to the roff validation module
rather than doing that twice in the mdoc and man validation modules.
Move the node relinking function to the roff library where it belongs.
In validation functions, only look at the node itself, at previous
nodes, and at descendants, not at following nodes or ancestors,
such that only nodes are inspected which are already validated.
Ingo Schwarze [Mon, 3 Dec 2018 21:00:10 +0000 (21:00 +0000)]
In the validators, translate obsolete macro aliases (Lp, Ot, LP, P)
to the standard forms (Pp, Ft, PP) up front, such that later code
does not need to look for the obsolete versions.
This reduces the risk of incomplete handling.
Ingo Schwarze [Mon, 3 Dec 2018 16:18:02 +0000 (16:18 +0000)]
Render .br as <br/>, not as an empty <div>.
The element <br/> was already employed for many other purposes,
so there is nothing wrong with using it.
Also, it is safer because <br/> is permitted in phrasing content,
whereas <div> is only allowed in flow content.
This is the first part of the HTML syntax audit which i wanted
to do for a long time. Reminded by a loosely related bug report
from Mark Harris <mark dot hsj at gmail dot com>.
Examples of where this caused HTML nesting syntax errors:
* in man(7) code between .nf and .fi
* in mdoc(7) code between .Bd -unfilled and .Ed
* in mdoc(7) code between .Ql Xo and .Xc
* in mdoc(7) code between .Rs and .Re
Ingo Schwarze [Thu, 29 Nov 2018 23:08:13 +0000 (23:08 +0000)]
Do not draw horizontal lines through vertical spans
which are requested in the data section rather than in the layout.
Mini-feature found in misc/pfm(1).
Ingo Schwarze [Thu, 29 Nov 2018 21:40:53 +0000 (21:40 +0000)]
Now that it is better understood how borders work,
rewrite tbl_hrule() in a simpler way.
Fix several bugs in the process.
No more special flags, just use the existing TBL_OPT_* from mandoc.h.
Reduce the number of tracked rows from three to two, which is more logical:
one above the line and one below is sufficient to figure out crossings.
No more magic quirks, all conditions are readily comprehensible now.
Add comments.
Ingo Schwarze [Thu, 29 Nov 2018 01:55:02 +0000 (01:55 +0000)]
Better handle automatic column width assignments in the presence of
horizontal spans, by implementing a moderately difficult iterative
algoritm. The benefit is that spans containing long text no longer
cause an excessive width of their starting column.
The result is likely not optimal, in particular in the presence
of many spans overlapping in complicated ways nor when spans
interact with equalizing or maximizing colums. But i doubt the
practical usefulness of making this more complicated.
Issue originally reported in synaptics(4), which now looks better,
by tedu@ three years ago, and reminded by Pali Rohar this summer.
Ingo Schwarze [Wed, 28 Nov 2018 14:23:06 +0000 (14:23 +0000)]
Bugfix: never set termp->enc to the ambiguous value TERMENC_LOCALE,
but instead set it to TERMENC_UTF8 or TERMENC_ASCII.
Makes tbl(7) box drawing work under -T locale (that is, by default
when LC_CTYPE is defined appropriately).
Ingo Schwarze [Mon, 26 Nov 2018 21:06:02 +0000 (21:06 +0000)]
Implement tbl(7) lines in -T html output,
as far as they are on the edges of table cells
rather than going through the middle of cells:
* the box, doublebox, and allbox options;
* the | and || layout modifiers;
* and the _ and = data lines;
- but not yet _ and = in individual layout and data cells.
Missing feature reported by Pali dot Rohar at gmail dot com.
Ingo Schwarze [Mon, 26 Nov 2018 17:44:34 +0000 (17:44 +0000)]
When a conditional block is closed by putting "\}" on a text line
by itself (which is somewhat unusual but not invalid; most authors
use the empty macro line ".\}" instead), agree more closely with
groff and do not produce a double space in the output.
Quirk reported by millert@.
While here, tweak the rest of the function body of roff_cond_text()
to more closely match roff_cond_sub(). The subtly different handling
could make people (including myself) wonder whether there is any
point in being different. Testing shows there is not.
Ingo Schwarze [Mon, 26 Nov 2018 17:11:11 +0000 (17:11 +0000)]
Mark Harris pointed out that people might have doubts whether all files
contained in the mandoc toolkit are "code and documentation", and whether
this is of any consequence for licensing, so clarify.
Ingo Schwarze [Mon, 26 Nov 2018 15:02:38 +0000 (15:02 +0000)]
Place mandoc.css into the public domain.
The reason for doing this rather than using the ISC license
is that i guess that in some contexts, a requirement to preserve
a Copyright and license header might be inconvenient, and i really
don't care at all how people use it.
What matters is that they do use it, or something similar - attempts
to use mandoc without any CSS are a constant source of grief and
bogus bug reports because HTML without CSS doesn't look very good:
the more structural and semantic and the less presentational and
old-fashioned the HTML, the more so.
Thanks to Mark Harris <mark dot hsj at gmail dot com> for pointing out
that the permissions on this particular file were unclear.
Ingo Schwarze [Mon, 26 Nov 2018 01:51:46 +0000 (01:51 +0000)]
Simplify writing of tbl(7) cells by using the new feature of passing
a NULL pointer for the value of a style attribute, in which case
the attribute is omitted from the HTML element.
Minus 12 lines of ugly and repetitive code, no functional change.
Ingo Schwarze [Mon, 26 Nov 2018 01:38:23 +0000 (01:38 +0000)]
Support more than one style attribute one the same HTML element.
In fact, this is already required when a table uses non-default
horizontal and vertical alignment in the same cell.
Ingo Schwarze [Sat, 24 Nov 2018 23:03:18 +0000 (23:03 +0000)]
Implement horizontal and vertical alignment of tbl(7) cell content
in -T html output. This does not handle spanned cells yet.
Missing feature reported by Pali dot Rohar at gmail dot com.
Ingo Schwarze [Fri, 23 Nov 2018 19:17:05 +0000 (19:17 +0000)]
When a font escape appears in the middle of a string,
make sure it doesn't cause output of bogus whitespace.
Fixing a bug reported by Pali dot Rohar at gmail dot com.
Ingo Schwarze [Thu, 22 Nov 2018 12:33:52 +0000 (12:33 +0000)]
Correct and shorten the description of the sort order of apropos(1)
results. As a matter of fact, which manpath the page comes from
does not matter in that context. That only matters for the priority
of pages in man(1) mode (without -a, -f, and -k).
Noticed while working on a patch from Yuri Pankov <yuripv at FreeBSD>.
Ingo Schwarze [Thu, 22 Nov 2018 12:01:46 +0000 (12:01 +0000)]
In apropos(1) output, stop sorting .Nm search results by name
priorities (bits). The obscure feature wasn't documented and merely
confused people - for example Edward Tomasz Napierala <trasz at
FreeBSD>, see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=227408.
Smaller patch provided by Yuri Pankov <yuripv at FreeBSD>, but i'm
also retiring the now unused "bits" member from struct manpage.
Simplification is good.
Ingo Schwarze [Thu, 22 Nov 2018 11:30:23 +0000 (11:30 +0000)]
In -T locale (the default), -T ascii, and -T utf8 mode, provide a new
output option -O tag[=term] to move right to the definition of "term" when
opening the manual page in a pager, effectively porting the -T html
fragment name feature - https://man.openbsd.org/ksh#ulimit - to the
terminal. Try:
$ man -O tag uvm_sysctl
$ man -O tag=ulimit ksh
$ man -O tag 3 compress
Feature development triggered by a question from kn@. Klemens also
tested, provided feedback that resulted in improvements, and provided
an OK.
Ingo Schwarze [Mon, 19 Nov 2018 19:27:37 +0000 (19:27 +0000)]
Improve POSIX compliance by making case-insensitive extended
regular expressions the default in man(1) -k searches, also matching
what the man-db package used by many Linux distributions does.
Originally requested by Wolfram Schneider <wosch at FreeBSD>
via Yuri Pankov <yuripv at FreeBSD>.
Feedback and OK cheloha@, and no objections when shown on tech@.
Thanks to cheloha@ for pointing out that POSIX requires this behaviour
and for the suggestion to explicitly say that *extended* regular
expressions are used here.
While here, unify spelling of case-[in]sensitive, fix a typo,
update the EXAMPLES, and add a STANDARDS section.
Ingo Schwarze [Mon, 19 Nov 2018 19:22:07 +0000 (19:22 +0000)]
Correctly construct empty lists in dbm_page_get().
Original commit message by the author of this bugfix patch, bluhm@:
lstmatch() expects a list of strings separated by \0 and terminated
with \0\0. In the NULL case dbm_page_get() returned only simple
strings so correct processing was depending on data layout. Use
an additional \0 to terminate the single string lists. Found by
mandoc regress since llvm linker on amd64 arranges strings differently.
Ingo Schwarze [Thu, 25 Oct 2018 01:32:40 +0000 (01:32 +0000)]
Implement the \f(CW and \f(CR (constant width font) escape sequences
for HTML output. Somewhat relevant because pod2man(1) relies on this.
Missing feature reported by Pali dot Rohar at gmail dot com.
Note that constant width font was already correctly selected before
this when required by semantic markup. Only attempting physical
markup with the low-level escape sequence was ineffective.
Ingo Schwarze [Tue, 23 Oct 2018 20:42:37 +0000 (20:42 +0000)]
The ctags(1) file format uses whitespace as a field delimiter, and
there is no escaping mechanism, so tags cannot contain whitespace.
Consequently, we used to simply not tag macro arguments containing
space characters. Instead, let's tag the first word, unless there
is a proper match for that word somewhere else. For example, this
makes ":tquery" work in ntpd.conf(5).
Feature suggested by kn@, who also thinks the implementation looks
reasonable and works in his testing.
Ingo Schwarze [Tue, 23 Oct 2018 17:18:01 +0000 (17:18 +0000)]
Input lines that are not blank but generate no output,
for example lines containing nothing but "\&", are significant
in no-fill mode and can be represented by blank lines inside <pre>.
Fixing a bug that Pali dot Rohar at gmail dot com found
in pod2man(1) output, for example Email::Address::XS(3p).
While here, inside no-fill mode, there is no need to encode
totally blank input lines by emulating .PP - just let them
through as we are inside <pre> anyway.
Ingo Schwarze [Fri, 19 Oct 2018 21:10:56 +0000 (21:10 +0000)]
Rewrite parse_path_info() to be four lines shorter, simplify ownership
of allocated strings, do not write to the input string, and improve
diagnostic output.
The confusing error message "invalid arch" as a reaction to mistyping
the release name was noticed by tb@, who likes the new code and message.
Ingo Schwarze [Thu, 4 Oct 2018 15:16:23 +0000 (15:16 +0000)]
Stop abusing subsections to represent the list of escape sequences;
instead, use .Bl -tag like everywhere else. The same was already
done for requests quite some time ago. Also, consistently mark up
escape sequences with .Ic, just like requests.
Ingo Schwarze [Tue, 2 Oct 2018 14:56:47 +0000 (14:56 +0000)]
Add an option -T html -O toc to add a brief table of contents near
the top of HTML pages containing at least two non-standard sections.
Suggested by Adam Kalisz and discussed with kristaps@ during EuroBSDCon 2018.
Ingo Schwarze [Tue, 2 Oct 2018 12:33:36 +0000 (12:33 +0000)]
Support a second argument to -O man,
selecting the format according to local existence of the file.
Suggested by kristaps@ during EuroBSDCon 2018.
Written on the train Frankfurt-Karlsruhe returning from EuroBSDCon.
Ingo Schwarze [Tue, 2 Oct 2018 12:18:33 +0000 (12:18 +0000)]
Render the eqn(7) "sqrt" function as U+221A in UTF-8 output.
This also agrees with what groff does.
Suggested by an attendee of EuroBSDCon 2018 in Bucuresti.
Written on the plane Bucuresti-Frankfurt returning from EuroBSDCon.
Ingo Schwarze [Mon, 1 Oct 2018 08:06:53 +0000 (08:06 +0000)]
Add missing URI encoding when writing HTTP redirects,
fixing a bug reported by <jungleboogie0 at gmail dot com> on bugs@.
While here, fully validate the arch name
such that we do not have to URI encode that one.
Ingo Schwarze [Mon, 27 Aug 2018 23:13:44 +0000 (23:13 +0000)]
Reduce excessive right padding in tagged list heads.
The 1.2em was an approximate equivalent of the 2n traditionally used
for terminal display, but it is much too wide for HTML rendering.
Issue reported by mikeb@.
Ingo Schwarze [Sat, 25 Aug 2018 16:53:38 +0000 (16:53 +0000)]
Rudimentary implementation of the roff(7) .char (output glyph
definition) request, used for example by groff_hdtbl(7).
This simplistic implementation may interact incorrectly
with the .tr (input character translation) request.
But come on, you are not only using .char *and* .tr, but you do so
with respect to the same character in the same manual page?
Ingo Schwarze [Thu, 23 Aug 2018 19:33:27 +0000 (19:33 +0000)]
The upcoming .while request will have to re-execute roff(7) lines
parsed earlier, so they will have to be saved for reuse - but the
read.c preparser does not know yet whether a line contains a .while
request before passing it to the roff parser. To cope with that,
save all parsed lines for now. Even shortens the code by 20 lines.
Ingo Schwarze [Thu, 23 Aug 2018 14:29:38 +0000 (14:29 +0000)]
Implement the roff(7) .shift and .return requests,
for example used by groff_hdtbl(7) and groff_mom(7).
Also correctly interpolate arguments during nested macro execution
even after .shift and .return, implemented using a stack of argument
arrays.
Note that only read.c, but not roff.c can detect the end of a macro
execution, and the existence of .shift implies that arguments cannot
be interpolated up front, so unfortunately, this includes a partial
revert of roff.c rev. 1.337, moving argument interpolation back into
the function roff_res().
Ingo Schwarze [Tue, 21 Aug 2018 18:15:22 +0000 (18:15 +0000)]
Implement the \\$@ escape sequence (insert all macro arguments,
quoted) in addition to the already supported \\$* (similar, but
unquoted). Then use \\$@ to improve the implementation of
the .als request (macro alias).
Needed by groff_hdtbl(7).
Gosh, it feels like the manual pages of the groff package are
exercising every bloody roff(7) feature under the sun. In the
manual page source code itself, not merely in the implementation
of the used macro packages, that is.
Ingo Schwarze [Tue, 21 Aug 2018 16:06:48 +0000 (16:06 +0000)]
Improve the ASCII rendering of \(Po (Pound Sterling)
and of the playing card suits to match groff, using feedback
from Ralph Corderoy <ralph at inputplus dot co dot uk>.
Ingo Schwarze [Tue, 21 Aug 2018 01:59:22 +0000 (01:59 +0000)]
Fix some issues found looking at groff_char(7):
* Add two missing characters, \('Y and \('y.
* The Weierstrass p is not capital, see http://unicode.org/notes/tn27/.
* Add a groff-compatible ASCII transliteration for U+02DC: "~".
Ingo Schwarze [Mon, 20 Aug 2018 17:25:09 +0000 (17:25 +0000)]
Expand \n(.$ (the number of macro arguments) right in roff_userdef(),
before even reparsing the expanded macro.
That is the least dirty way to fix the bug that \(.$ remained set
after execution of the user-defined macro ended. Any other way
to fix it would probably require changes to read.c, which really
shouldn't be bothered with such roff(7) internals.
Ingo Schwarze [Sun, 19 Aug 2018 23:58:09 +0000 (23:58 +0000)]
Disable one test for now that is broken after the addition of \).
It is not broken because of \), which is correctly implemented, but
the addition merely reveals a hidden bug elsewhere, almost certainly
in \\ handling. Given that \\ is among the most mysterious escape
sequences and using it is very strongly discouraged in manual pages,
fixing that is not urgent - and may be hard.
Ingo Schwarze [Sun, 19 Aug 2018 23:10:28 +0000 (23:10 +0000)]
Do alignment of non-numeric strings in numeric cells the same way
as groff, and also honour the explicit alignment indicator "\&".
This required an almost complete rewrite of both the measurement
function and the formatter function for numeric cells.
Ingo Schwarze [Sun, 19 Aug 2018 17:46:14 +0000 (17:46 +0000)]
Mostly complete implementation of the 'c' (character available)
roff conditional, except that the .char request still isn't supported
and that behaviour differs from groff in many edge cases.
But at least valid character names and numbers are now distinguished
from invalid ones.
This also fixes the bug that parsing of the 'c' conditional was
incomplete, which resulted in leaking the tested character to the
input parser at the beginning of the body when the condition was
inverted.
Ingo Schwarze [Sat, 18 Aug 2018 04:32:10 +0000 (04:32 +0000)]
Massively reduce the amount of text, cutting it down to what is needed
to understand existing man(7) code and deleting parts that would only
be useful for writing new documents, which we strongly discourage:
* Delete the MANUAL STRUCTURE section which merely duplicates mdoc(7).
* Delete internal cross references only useful for writing new code.
* Delete many instances of "included only for compatibility" as the
whole language is only provided for compatibility.
* Fix a few minor errors and omissions.
Ingo Schwarze [Fri, 17 Aug 2018 20:33:37 +0000 (20:33 +0000)]
Remove more pointer arithmetic passing via regions outside the array
that is undefined according to the C standard. Robert Elz <kre at
munnari dot oz dot au> pointed out i wasn't quite done yet.
Ingo Schwarze [Thu, 16 Aug 2018 15:05:34 +0000 (15:05 +0000)]
Do not calculate a pointer to a memory location before the beginning of
a static array. Christos Zoulas, Robert Elz, and Andreas Gustafsson
point out that is undefined behaviour by the C standard even if we
never access the pointer.
Ingo Schwarze [Thu, 16 Aug 2018 14:07:11 +0000 (14:07 +0000)]
Document \*(.T.
While here, delete the section about predefined strings.
For manual pages, the concept is not important enough to be discussed
in such a prominent place, and some aspects of the text were also
misleading. Add a shorter version of the relevant parts to the
description of the \* escape sequence instead.
Ingo Schwarze [Thu, 16 Aug 2018 13:54:06 +0000 (13:54 +0000)]
Implement the \*(.T predefined string (interpolate device name)
by allowing the preprocessor to pass it through to the formatters.
Used for example by the groff_char(7) manual page.
Ingo Schwarze [Wed, 15 Aug 2018 14:37:41 +0000 (14:37 +0000)]
Change comment: NetBSD just fixed their headers; but leave the
workaround in place for now for the benefit of older systems,
and other systems might contain similar problems.
Ingo Schwarze [Wed, 15 Aug 2018 02:15:52 +0000 (02:15 +0000)]
Autodetect whether _GNU_SOURCE or _OPENBSD_SOURCE are needed; the
latter is a NetBSD idiosyncrasy reported by wiz@. Also take into
account that NetBSD declares getsubopt(3) in the wrong header.
Ingo Schwarze [Fri, 10 Aug 2018 20:40:45 +0000 (20:40 +0000)]
The groff man-ext macros define fonts CB, CI, and CR,
and some groff manual pages actually use them in .ft requests.
It's easy enough to handle these .ft requests in mandoc, too.
Ingo Schwarze [Thu, 9 Aug 2018 17:30:36 +0000 (17:30 +0000)]
If somebody asks "man 3 chmod",
don't respond with the lie: "No entry for chmod in the manual."
Instead, say "No entry for chmod in section 3 of the manual."
Came up after a question from kn@; OK kn@.
Ingo Schwarze [Wed, 8 Aug 2018 14:30:48 +0000 (14:30 +0000)]
Even though we strongly discourage escaping hyphens in manual pages
in general, when introducing the *typographic* term "hyphen",
actually display a real hyphen in output modes supporting it.
Ingo Schwarze [Wed, 8 Aug 2018 14:16:08 +0000 (14:16 +0000)]
Reorder the text in the "Dashes and Hyphens" subsection to keep the
simplest and most important instructions together and at the
beginning. No text change.
Suggested by jmc@.
Ingo Schwarze [Wed, 8 Aug 2018 14:03:27 +0000 (14:03 +0000)]
Clarify the confusing "(text)" annotation in the character lists.
In some cases, it meant "render as an ASCII character in output
modes that have a notion of codepoints" (e.g. UTF-8, HTML); in other
cases, "render in the text font in output modes that also provide
a special font for mathematical symbols" (e.g. PostScript, PDF).
Also explicitly annotate the escape sequences that use a special
font if available.
OK bentley@
Ingo Schwarze [Wed, 8 Aug 2018 13:54:05 +0000 (13:54 +0000)]
After years of deliberation, finally provide a clear recommendation
for hyphens and minus signs in manual pages.
Since there is consensus that a typographically perfect solution is
impossible, let's KISS - just write "-", don't bother with "\-", all
currently relevant manual page formatters can handle "-" reasonably.
OK jmc@ bentley@