Ingo Schwarze [Mon, 3 Nov 2014 23:18:39 +0000 (23:18 +0000)]
Allow the five man(7) font macros to concatenate their line arguments,
the same way the mdoc(7) macros marked MDOC_JOIN do it.
In -Thtml, this removes bogus <br/> when the font macros are used
in no-fill mode; issue found by jsg@ in the Xcursor(3) SYNOPSIS.
As a bonus, this slightly reduces the size of the syntax tree.
Ingo Schwarze [Sat, 1 Nov 2014 06:03:13 +0000 (06:03 +0000)]
Use struct buf in libroff, it is very natural there
and reduces the number of arguments of many functions.
While here, sprinkle some KNF.
No functional change.
Ingo Schwarze [Sat, 1 Nov 2014 04:08:43 +0000 (04:08 +0000)]
Refactor, no functional change: Remove the parse point from struct buf.
Some functions need multiple parse points, some none at all,
and it varies whether any of them need to be passed around.
So better pass them as a separate argument, and only when needed.
Ingo Schwarze [Thu, 30 Oct 2014 20:10:02 +0000 (20:10 +0000)]
Major bugsquashing with respect to -offset and -width:
1. Support specifying the .Bd and .Bl -offset as a macro default width;
while here, simplify the code handling the same for .Bl -width.
2. Correct handling of .Bl -offset arguments: unlike .Bd -offset, the
arguments "left", "indent", and "indent-two" have no special meaning.
3. Fix the scaling of string length -offset and -width arguments in -Thtml.
Triggered by an incomplete documentation patch from bentley@.
Ingo Schwarze [Wed, 29 Oct 2014 03:35:09 +0000 (03:35 +0000)]
Some fine tuning of console rendering of named special characters.
Correct ASCII rendering: \(lb \(<> \(sd
Make ASCII rendering agree with groff, using backspace overstrike:
\(da \(ua \(dA \(uA \(fa \(c* \(c+ \(ib \(ip \(/_ \(pp \(is \(dd \(dg
Ingo Schwarze [Wed, 29 Oct 2014 00:17:43 +0000 (00:17 +0000)]
In terminal output, unify handling of Unicode and numbered character
escape sequences just like it was earlier implemented for -Thtml.
Do not let control characters other than ASCII 9 (horizontal tab)
propagate to the output, even though groff allows them; but that
really doesn't look like a great idea.
Let mchars_num2char() return int such that we can distinguish invalid \N
syntax from \N'0'. This also reduces the danger of signed char issues
popping up.
Ingo Schwarze [Tue, 28 Oct 2014 18:49:33 +0000 (18:49 +0000)]
In -Tascii mode, print "<?>" only for Unicode escapes of unknown
representation, not for character escapes with unknown names.
According to groff, the latter produce no output, and we now warn
about them.
Ingo Schwarze [Tue, 28 Oct 2014 17:36:19 +0000 (17:36 +0000)]
Make the character table available to libroff so it can check the
validity of character escape names and warn about unknown ones.
This requires mchars_spec2cp() to report unknown names again.
Fortunately, that doesn't require changing the calling code because
according to groff, invalid character escapes should not produce
output anyway, and now that we warn about them, that's fine.
Ingo Schwarze [Tue, 28 Oct 2014 02:43:59 +0000 (02:43 +0000)]
Refine -Tascii rendering of Unicode characters, mostly to better agree
with groff, in particular in cases where groff uses backspace overstrike.
In two cases, agreement is impossible because groff clobbers the
previous line: \(*G \(*S
In a number of cases, groff rendering is so misleading that i chose
to render differently: \(Sd \(TP \(Tp \(Po \(ps \(sc \(r! \(r? \(de
While here, also correct the \(la and \(ra Unicode code points.
Ingo Schwarze [Mon, 27 Oct 2014 20:41:58 +0000 (20:41 +0000)]
Support overstriking by backspace in PostScript and PDF output.
Of course, this is only a minor improvement; it would be much better
to support non-ASCII characters in these output modes, but that
would require major changes that i'm not going to work on right now.
The main reason for doing this is that it allows to get ASCII output
closer to groff.
Ingo Schwarze [Mon, 27 Oct 2014 16:29:06 +0000 (16:29 +0000)]
Handle output encoding for unicode, numbered and named escape sequences
in one common, safe way instead of three different ways. In particular,
* skip NUL, it is used to mean "no output desired"
* deny 0x01-0x1F and 0x7F-0x9F, print REPLACEMENT CHARACTER instead
* print 0x20-0x7E literally or name-encoded, as required
* print characters above 0x9F numerically
Ingo Schwarze [Mon, 27 Oct 2014 13:31:04 +0000 (13:31 +0000)]
Fix a regression in term.c rev. 1.229 reported by bentley@:
In UTF-8 output, do not print anything if mchars_spec2cp() returns 0.
In particular, this repairs handling of zero-width spaces (\&).
While here, let mchars_spec2cp() return 0xFFFD instead of -1
if the character is not found, simplifying the using code.
In HTML output, do not print obfuscated ASCII characters and
do not test for one-char escapes, mchars_spec2cp() already does that.
Ingo Schwarze [Sun, 26 Oct 2014 18:07:28 +0000 (18:07 +0000)]
In -Tascii mode, provide approximations even for some Unicode escape
sequences above codepoint 512 by doing a reverse lookup in the
existing mandoc_char(7) character table.
Again, groff isn't smart enough to do this and silently discards such
escape sequences without printing anything.
Ingo Schwarze [Sun, 26 Oct 2014 17:12:03 +0000 (17:12 +0000)]
Improve -Tascii output for Unicode escape sequences: For the first 512
code points, provide ASCII approximations. This is already much better
than what groff does, which prints nothing for most code points.
A few minor fixes while here:
* Handle Unicode escape sequences in the ASCII range.
* In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8
and the string "<?>" for -Tascii output.
* Handle all one-character escape sequences in mchars_spec2{cp,str}()
and remove the workarounds on the higher level.
Ingo Schwarze [Sat, 25 Oct 2014 15:23:56 +0000 (15:23 +0000)]
With the current architecture, we can't support inline equations
inside tables, sorry. So don't even try to parse tbl(7) blocks for
eqn(7) delimiters.
Broken table layout found in glPixelMap(3) while investigating
a bug report by Theo Buehler <theo at math dot ethz dot ch>.
Ingo Schwarze [Sat, 25 Oct 2014 14:35:37 +0000 (14:35 +0000)]
Report arguments to .EQ as an error, and simplify the code:
* drop trivial wrapper function roff_openeqn()
* drop unused first arg of function eqn_alloc()
* drop usused member "name" of struct eqn_node
While here, sync to OpenBSD by killing some trailing blanks.
Ingo Schwarze [Mon, 20 Oct 2014 01:43:48 +0000 (01:43 +0000)]
show the {MDOC,MAN}_EQN node, it contains interesting information,
in particular line and column numbers and flags;
but hide the uninteresting EQN_ROOT box
Ingo Schwarze [Thu, 16 Oct 2014 01:11:20 +0000 (01:11 +0000)]
Implement in-line equations, much needed by Xenocara manuals.
Put the steering into the roff parser rather than into the mdoc
parser such that it works for all macro languages and on both text
and macro lines.
Line breaks and blank characters generated before and after in-line
equations are not perfect yet, but let's do one thing at a time.
Ingo Schwarze [Tue, 14 Oct 2014 02:16:06 +0000 (02:16 +0000)]
Rudimentary implementation of the e, x, and z table layout modifiers
to equalize, maximize, and ignore the width of columns.
Does not yet take vertical rulers into account,
and does not do line breaks within table cells.
Considerably improves the lftp(1) manual; issue noticed by sthen@.
Ingo Schwarze [Mon, 13 Oct 2014 22:00:47 +0000 (22:00 +0000)]
Properly scale string length measurements for PostScript and PDF output;
this doesn't change anything for ASCII and UTF-8.
Problem reported by bentley@.
Ingo Schwarze [Mon, 13 Oct 2014 17:17:45 +0000 (17:17 +0000)]
Stricter syntax checking of Unicode character names:
Require exactly 4, 5 or 6 hex digits and allow nothing else.
This avoids mishandling stuff like \[ua] and \C'uA' as Unicode
and also fixes underlining in eqn(7) -Thtml output which uses \[ul].
Problem found and semantics suggested by kristaps@.
Ingo Schwarze [Sun, 12 Oct 2014 19:31:41 +0000 (19:31 +0000)]
Improve error handling in the eqn(7) parser.
Get rid of the first fatal error, MANDOCERR_EQNSYNT.
In eqn(7), there is no need to be bug-compatible with groff, so there
is no need to abondon the whole equation in case of a syntax error.
In particular:
* Skip "back", "delim", "down", "fwd", "gfont", "gsize", "left",
"right", "size", and "up" without arguments.
* Skip "gsize" and "size" with a non-numeric argument.
* Skip closing delimiters that are not open.
* Skip "above" outside piles.
* For diacritic marks and binary operators without a left operand,
default to an empty box.
* Let piles and matrices take one argument rather than insisting
on a braced list. Let HTML output handle that, too.
* When rewinding, if the root box is guaranteed to match
the termination condition, no error handling is needed.
Ingo Schwarze [Sat, 11 Oct 2014 21:14:16 +0000 (21:14 +0000)]
warn about parentheses in function names after .Fn and .Fo;
particularly useful when converting from other languages to mdoc(7);
feature suggested by bentley@
Ingo Schwarze [Fri, 10 Oct 2014 12:19:25 +0000 (12:19 +0000)]
Make eqn(7) -Ttree output more useful:
* Reduce noise by not printing default attributes.
* Print missing "top" and "bottom" attributes.
* Print mnemonics, not code numbers for expression positions.
* Do not print unused "pile" attribute.
Re-write of eqn(7) parser and MathML output.
This adds parser-level support for the grammar described by the eqn
second-edition technical paper, "Typesetting Mathematics — User's Guide"
(Kernighan, Cherry).
The reason for this re-write is the grouping rules, which were not
possible given the existing implementation.
The re-write has also considerably simplified the HTML (and, if it ever
is completed, terminal) front-end.
Ingo Schwarze [Tue, 7 Oct 2014 14:07:03 +0000 (14:07 +0000)]
If a tbl(7) layout contains unknown font modifiers, fall back to the
default font rather than failing the whole table.
Needed by some pages in books/man-pages-posix.
Written on the plane back from EuroBSDCon in Sofia.
Crudely accomodate for matrices by way of adjacent tables. We don't do this
nicely right now because eqn uses column ordering.
Also add from/to support and to support.
Support a decent subset of eqn(7) in MathML.
This has basic support for positions (under, sup, sub, sub/sup) and piles.
It *does not* support right-left grouping (among many other things), e.g.,
Remove <p> in favour of <div class="spacer">.
This is good because <p> is brittle: it can't appear within other block
macros.
This fixes a regression of the original HTML5 patch as noted by schwarze@
on the tech@ list, 14/8/2014.
First, add space for default styling for HTML5 (non-fragment) output.
This uses a <style /> block right before the <link /> for the stylesheet.
Use this to kick out hardcoded header and footer table widths.
Five year old typo reported by Theo Buehler at math dot ethz dot ch, thanks.
I nearly asked: ``What's wrong with it? It formats as "intended".''
(However, what Kristaps intended to write was "indented".)
Support backslash-escaping of white space in the query expression,
to be more similar to apropos(1) called from the shell.
Missing feature reported by Marcus MERIGHI <mcmer dash openbsd at
tor dot at> on misc@.
If a manpath directory (for example, a _whatdb entry from man.conf(5)
or an entry in the MANPATH environment variable) does not exist,
silently skip it. This brings makewhatis(8) back closer to the
behaviour of espie@'s version and ought to shut up the weekly(8)
whining observed by henning@ on machines not having xbase installed.
Also, don't error out after the first unusable manpath entry, still
try the others.
Of course, still complain about non-existent directories specified
on the command line and about any directories failing for other
reasons than ENOENT.
Do not report a page as arch=any merely because .Dt lacks the third argument.
Pages found outside arch-specific dirs still get arch=any, of course.
Issue reported by justinhenryhaynes at gmail dot com on misc@, thanks!
Simplify by handling empty request lines at the one logical place
in the roff parser instead of in three other places in other parsers.
No functional change.
Move main format autodetection from the parser dispatcher to the
roff parser where .Dd and .TH are already detected, anyway. This
improves robustness because it correctly handles whitespace or an
alternate control character before Dd. In the parser dispatcher,
provide a fallback looking ahead in the input buffer instead of
always assuming man(7). This corrects autodetection when Dd is
preceded by other macros or macro-like handled requests like .ll.
Triggered by reports from Daniel Levai about issues on Slackware Linux.