Add initial libmdoc and libman top-most machinery for accepting TBL
directives. For now this will just ignore them (except for -Ttree,
which just notes that an EQN's been accepted).
Add initial EQN support to mandoc. This parses, then throws away, data
between EQ and EN roff blocks. EQN is different from TBL in that data
after .EQ is unilaterally considered an equation until an .EN. Thus,
there's no need to jump through hoops in having table spans and so on.
This is ONLY the parse code framework in libroff. EQN is not yet passed
into the backends.
If `Ns' is specified on its own line, it should be ignored. This is
shitty groff behaviour. Do the same, but raise a warning to this
effect. This from a TODO noted by schwarze@.
Ingo Schwarze [Sun, 30 Jan 2011 16:05:37 +0000 (16:05 +0000)]
Implement the \N'number' (numbered character) roff escape sequence.
Don't use it in new manuals, it is inherently non-portable, but we
need it for backward-compatibility with existing manuals, for example
in Xenocara driver pages.
ok kristaps@ jmc@ and tested by Matthieu Herrb (matthieu at openbsd dot org)
Ingo Schwarze [Tue, 25 Jan 2011 12:24:27 +0000 (12:24 +0000)]
Since tbl_data() can now produce multiple spans, let parsebuf()
generate man(7) or mdoc(7) nodes for all these spans,
not only for the last one.
Restores the horizontal lines in the cpu(4/hppa) tables.
ok kristaps@
Ingo Schwarze [Tue, 25 Jan 2011 12:16:22 +0000 (12:16 +0000)]
Do not skip data after horizontal lines in the layout.
Instead, let one line of input data add two new spans
to the tbl tree during one single call of tbl_data().
Note that this causes the horizontal line to get parsed
into the tbl tree, but not yet used in the output,
which will be fixed next.
Avoids data loss in cpu(4/hppa).
ok kristaps@
Ingo Schwarze [Tue, 25 Jan 2011 01:12:02 +0000 (01:12 +0000)]
Ignore .ns (no-space mode), .ps (change point size), .ta (tab control)
for now. All of these just cause a bit too much or too little
whitespace, but no serious formatting problems.
Triggered by reports from brad@.
Ingo Schwarze [Tue, 25 Jan 2011 00:40:14 +0000 (00:40 +0000)]
As noticed by deraadt@, it goes without saying that text files
on a UNIX system use UNIX conventions, and UNIX tools working
on them expect that.
ok jmc@
Ingo Schwarze [Mon, 24 Jan 2011 23:41:55 +0000 (23:41 +0000)]
Skip carriage return before newline, if any.
As pointed out by Joerg Sonnenberger, this is useful
because we use mmap(3) and look for '\n' by hand.
"check it in" kristaps@
Ingo Schwarze [Sat, 22 Jan 2011 14:00:52 +0000 (14:00 +0000)]
Check argument count validation for all in_line() macros.
Most empty in_line() macros are already removed by the parser,
so there is no need to check again in mdoc_validate.c.
This also downgrades almost all remaining argument count issues
from ERROR to WARNING.
ok kristaps@
Ingo Schwarze [Sat, 22 Jan 2011 13:16:02 +0000 (13:16 +0000)]
When finding the roff .it request (line trap),
make it clear that you cannot use mandoc to format that page (yet).
Triggered by a report from brad@, ok kristaps@.
Ingo Schwarze [Mon, 17 Jan 2011 00:21:29 +0000 (00:21 +0000)]
Refrain from throwing fatal errors for
* .br .sp .nf .fi .na with arguments - just skip the arguments
* .TH lacking arguments - use empty strings instead like groff
* .TH with excessive arguments - skip those
Reminded by joerg@, ok kristaps@.
Ingo Schwarze [Sun, 16 Jan 2011 20:12:45 +0000 (20:12 +0000)]
When processing a blank text line, do not break out of text processing
into macro processing code. Fixing a regression introduced in 1.95,
found because it caused segfaults in my regression suite.
OK kristaps@
Ingo Schwarze [Sun, 16 Jan 2011 04:00:34 +0000 (04:00 +0000)]
Implement the roff .rm request (remove macro).
Using the new roff_getname() function, this is really simple.
Breaks mandoc of the habit of reporting an error in each pod2man(1) preamble.
Reminded by a report from brad@; ok kristaps@.
Change how -Thtml behaves with tables: use multiple rows, with widths
set by COL, until an external macro is encountered. At this point in
time, close out the table and process the macro. When the first table
row is again re-encountered, re-start the table. This requires a bit of
tracking added to "struct html", but the change is very small and
follows the logic of meta-fonts. This all follows a bug-report by
joerg@.
Downgrade -man message of ignored empty paragraph to MANDOC_IGNPAR. The
change in man_macro.c was from an assertion caused by a subtle problem:
(1) macro is removed, causing m->last to be m->last->parent; (2) by jumping
to the m->last->parent after post-validation, the original
m->last->parent is skipped; (3) the rewinder climbs to the root of the
tree and aborts.
The original issue recorded in the TODO by schwarze@, reminded by Brad
Smith.
Make -man -Tascii not break within literal lines, e.g.,
.nf
.B hello world
.fi
Also, clean up the print_man_node() function a little bit. This problem
has long since been in the TODO and was recently noted again by Brad
Smith. The -T[x]html fix will follow...
Add support for "^" vertical spanners. Unlike GNU tbl, raise
error-class messages when data is being ignored by specifying it in "^"
cells (either as-is or in blocks).
Also note again that horizontal spanners aren't really supported...
Ingo Schwarze [Tue, 11 Jan 2011 00:11:45 +0000 (00:11 +0000)]
Refactoring in preparation for .rm support:
Unify parsing of names given as roff request arguments into a new
function roff_getname(), which is rather different from the parsing
function for normal arguments, mandoc_getarg(), because names cannot
be quoted and cannot contain whitespace or escaped characters.
The new function now throws an ERROR when finding escaped characters
in a name.
"I'm fine with this." kristaps@
When a row of data is being parsed and it's a line or double-line
(instead of data), re-use the last "layout" pointer instead of advancing
to the next one.
Fixes: T} can be followed by a delimiter then more data. Make this
work and add documentation for it.
Also make tbl_term() not puke if the number of data cells is less than
the number of layout cells (which happens from time to time). This
still needs work because we should pad out empty cells so that the
borders all work out.
Stuff tbl_calc() into out.c so that it can be shared by all output modes
(isn't now, but will need to be, used by -T[x]html also). Necessitated
a lot of churn in getting tbl_calc* code out of tbl_term.c and into
out.c, including renaming some structures and so on. The abstraction is
in having a pointer to a wrapper function for calculating string widths.
The char devices use term_strlen and term_len; the others will probably
just use strlen().
While at it, remove some superfluous assertions in the tbl code. This
allows all tbl manuals to clear.
Lastly, set the right-margin to be the maximum margin for each table
span. This allows big, complicated tbl-pages like terminfo to be
displayed. They're ugly, but they work.
Ingo Schwarze [Tue, 4 Jan 2011 23:48:39 +0000 (23:48 +0000)]
Merge from OpenBSD (similar to my original fix committed on Oct 15, 2010):
For now, parse and ignore minimal column width specifications.
First step to get terminfo(5) to build.
Support `T{' and `T}' data blocks. When a standalone `T{' is
encountered as a line's last data cell, move into TBL_PART_CDATA mode
whilst leaving the cell's designation as TBL_DATA_NONE. When new data
arrives that's not a standalone `T}', append it to the cell contends.
Close out and warn appropriately.
Fix to make horizontal spanners in the layout be properly printed.
mandoc also now warns (so does tbl(1)) if a horizontal spanner is
specified along with data.
While here, fix up some documentation and uncomment the tbl reference.
Ingo Schwarze [Tue, 4 Jan 2011 01:23:18 +0000 (01:23 +0000)]
Multiple man(7) .IP and .TP fixes started during p2k10:
Affecting both -Tascii and -Thtml:
* The .IP HEAD uses the second argument as the width, not the last one.
* Only print the first .IP HEAD argument, not all but the last.
Affecting only -Tascii:
* The .IP and .TP HEADs must be printed without literal mode,
but literal mode must be restored afterwards.
* After the .IP and .TP bodies, we only want term_newln(), not
term_flushln(), or we would get two blank lines in literal mode.
* The .TP HEAD does not use TWOSPACE, just like .IP doesn't either.
* In literal mode, clear NOLPAD after each line, or subsequent lines
would get no indentation whatsoever.
Affecting only -Thtml:
* Only print next-line .TP children, instead of all but the first.
OK kristaps@ on the -Tascii part; and:
"Can you work this into man_html.c, too?"
Ingo Schwarze [Mon, 3 Jan 2011 23:53:51 +0000 (23:53 +0000)]
Partial cleanup of argument count validation in mdoc(7):
* Do not segfault on empty .Db, .Rs, .Sm, and .St.
* Let check_count() really throw the requested level, not always ERROR.
* Downgrade most bad argument counts from ERROR to WARNING.
* And some related internal cleanup.
Looks fine to kristaps@.
Note that the macros using eerr_ge1() still need to be checked at a later
time; but as all the others are done, let's use what we already have.
Ingo Schwarze [Mon, 3 Jan 2011 23:24:16 +0000 (23:24 +0000)]
Calling a macro with fewer arguments than it is defined with is OK;
the remaining ones default to the empty string, not to NULL.
Regression reported and fix tested by kristaps@.
Ingo Schwarze [Mon, 3 Jan 2011 22:42:37 +0000 (22:42 +0000)]
Unify roff macro argument parsing (in roff.c, roff_userdef()) and man macro
argument parsing (in man_argv.c, man_args()), both having different bugs,
to use one common macro argument parser (in mandoc.c, mandoc_getarg()),
because from the point of view of roff, man macros are just roff macros,
hence their arguments are parsed in exactly the same way.
While doing so, fix these bugs:
* Escaped blanks (i.e. those preceded by an odd number of backslashes)
were mishandled as argument separators in unquoted arguments to
user-defined roff macros.
* Unescaped blanks preceded by an even number of backslashes were not
recognized as argument separators in unquoted arguments to man macros.
* Escaped backslashes (i.e. pairs of backslashes) were not reduced
to single backslashes both in unquoted and quoted arguments both
to user-defined roff macros and to man macros.
* Escaped quotes (i.e. pairs of quotes inside quoted arguments) were
not reduced to single quotes in man macros.
OK kristaps@
Note that mdoc macro argument parsing is yet another beast for no good
reason and is probably afflicted by similar bugs. But i don't attempt
to fix that right now because it is intricately entangled with lots of
unrelated high-level mdoc(7) functionality, like delimiter handling and
column list phrase handling. Disentagling that would waste too much
time now.
Switch on the `TS' documentation in roff.7. As per off-line discussion,
this may be moved to tbl.7, but for the time being, keep it in the
document as it's developed.
Also note that my handling of horizontal rules in layouts needs some
work.
Add in support for number table cells that account for escapes and so
on. Note also that -Tps and -Tpdf, with these last two commits, produce
more readable output ("less crappy").
Start using term_strlen() instead of strlen(). tbl_term.c can now
properly handle embedded escapes when calculating its widths. NOTE:
this doesn't yet apply to the decimal-point calculation.
Make width calculations occur within tbl_term.c, not tbl.c. This allows
for front-ends to make decisions about widths, not the back-end.
To pull this off, first make each tbl_head contain a unique index value
(0 <= index < total tbl_head elements) and remove the tbl_calc() routine
from the back-end.
Then, when encountering the first tbl_span in the front-end, dynamically
create an array of configurations (termp_tbl) keyed on each tbl_head's
unique index value. Construct the decimals and widths at this time,
then continue parsing as before.
The termp_tbl and indexes are required because we pass a const tbl AST
into the front-end.
Make sure we don't continue recursively parsing once we've exited with
failure (this had caused some segfaults with the new assert() call in
MAN_HALT and MDOC_HALT).
Clarified the role of MDOC_HALT in libmdoc functions by having accessor
functions assert() if they're called after MDOC_HALT is set.
This makes more sense than returning 0 because this return value is used
for parse errors, not programme-flow errors, and it's inconsistent to
use the same value for both. Plus, prior to this, I'd return 0 without
printing an error message, which would cause failure to go unreported to
the operator.
Turn on -Tascii tbl printing. The output still has some issues---I'm
not sure whether it's in the header calculation or term.c squashing
spaces or whatever, but let's get this in for general testing as soon as
possible.
Churn to get parts of 'struct tbl' visible from mandoc.h: rename the
existing 'struct tbl' as 'struct tbl_node', then move all option stuff
into a 'struct tbl' in mandoc.h.
This conflicted with a structure in chars.c, which was renamed.