1 .\" $Id: mandoc.3,v 1.37 2016/07/07 19:19:01 schwarze Exp $
3 .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010-2016 Ingo Schwarze <schwarze@openbsd.org>
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
18 .Dd $Mdocdate: July 7 2016 $
38 .Nd mandoc macro compiler library
43 .Fd "#define ASCII_NBRSP"
44 .Fd "#define ASCII_HYPH"
45 .Fd "#define ASCII_BREAK"
49 .Fa "enum mandoclevel wlevel"
55 .Fa "enum mandocerr errtype"
56 .Fa "enum mandoclevel level"
57 .Fa "const char *file"
64 .Fa "struct mparse *parse"
68 .Fa "const struct mparse *parse"
72 .Fa "struct mparse *parse"
76 .Fa "struct mparse *parse"
77 .Fa "const char *fname"
79 .Ft "enum mandoclevel"
81 .Fa "struct mparse *parse"
83 .Fa "const char *fname"
87 .Fa "struct mparse *parse"
91 .Fa "struct mparse *parse"
92 .Fa "struct roff_man **man"
101 .Fa "enum mandoclevel"
107 .Fa "const struct roff_node *node"
112 .Vt extern const char * const * mdoc_argnames;
113 .Vt extern const char * const * mdoc_macronames;
116 .Fa "struct roff_man *mdoc"
121 .Vt extern const char * const * man_macronames;
122 .Ft "const struct mparse *"
124 .Fa "const struct roff_man *man"
128 .Fa "struct roff_man *man"
135 manual into an abstract syntax tree (AST).
137 manuals are composed of
141 and may be mixed with
148 The following describes a general parse sequence:
151 initiate a parsing sequence with
167 retrieve the syntax tree with
170 depending on whether the
172 member of the returned
184 iterate over parse nodes with starting from the
186 member of the returned
187 .Vt struct roff_man ;
189 free all allocated memory with
195 and go back to step 2 to parse new files.
198 This section documents the functions, types, and variables available
201 with the exception of those documented in
207 .It Vt "enum mandocerr"
208 An error or warning message during parsing.
209 .It Vt "enum mandoclevel"
210 A classification of an
212 as regards system operation.
213 See the DIAGNOSTICS section in
215 regarding the meanings of the levels.
216 .It Vt "struct mparse"
217 An opaque pointer to a running parse sequence.
222 This may be used across parsed input if
224 is called between parses.
226 A prototype for a function to handle error and warning
227 messages emitted by the parser.
232 Obtain a text-only representation of a
233 .Vt struct roff_node ,
234 including text contained in its child nodes.
235 To be used on children of the
238 .Vt struct roff_man .
239 When it is no longer needed, the pointer returned from
244 Get the parser used for the current output.
252 parse tree obtained with
261 parse tree obtained with
269 The arguments have the following effect:
270 .Bl -tag -offset 5n -width inttype
276 bit is set, only that parser is used.
277 Otherwise, the document type is automatically detected.
284 file inclusion requests are always honoured.
285 Otherwise, if the request is the only content in an input file,
286 only the file name is remembered, to be returned in the
293 bit is set, parsing is aborted after the NAME section.
294 This is for example useful in
297 to quickly build minimal databases.
300 .Dv MANDOCLEVEL_BADARG ,
301 .Dv MANDOCLEVEL_ERROR ,
303 .Dv MANDOCLEVEL_WARNING .
304 Messages below the selected level will be suppressed.
306 A callback function to handle errors and warnings.
310 If printing of error messages is not desired,
314 A default string for the
317 macro, overriding the
319 preprocessor definition and the results of
326 The same parser may be used for multiple files so long as
328 is called between parses.
330 must be called to free the memory allocated by this function.
336 Free all memory allocated by
342 .It Fn mparse_getkeep
343 Acquire the keep buffer.
344 Must follow a call of
351 Instruct the parser to retain a copy of its parsed input.
352 This can be acquired with subsequent
360 Open the file for reading.
363 does not already end in
365 try again after appending
367 Save the information whether the file is zipped or not.
368 Return a file descriptor open for reading or -1 on failure.
377 Parse a file descriptor opened with
381 Pass the associated filename in
383 This function may be called multiple times with different parameters; however,
387 should be invoked between parses.
393 Reset a parser so that
401 Obtain the result of a parse.
402 One of the two pointers will be filled in.
407 .It Fn mparse_strerror
408 Return a statically-allocated string representation of an error code.
413 .It Fn mparse_strlevel
414 Return a statically-allocated string representation of a level code.
422 .It Va man_macronames
423 The string representation of a
428 The string representation of an
430 macro argument as indexed by
431 .Vt "enum mdocargt" .
432 .It Va mdoc_macronames
433 The string representation of an
438 .Sh IMPLEMENTATION NOTES
439 This section consists of structural documentation for
443 syntax trees and strings.
444 .Ss Man and Mdoc Strings
445 Strings may be extracted from mdoc and man meta-data, or from text
446 nodes (MDOC_TEXT and MAN_TEXT, respectively).
447 These strings have special non-printing formatting cues embedded in the
448 text itself, as well as
450 escapes preserved from input.
451 Implementing systems will need to handle both situations to produce
453 In general, strings may be assumed to consist of 7-bit ASCII characters.
455 The following non-printing characters may be embedded in text strings:
458 A non-breaking space character.
462 A breakable zero-width space.
465 Escape characters are also passed verbatim into text strings.
466 An escape character is a sequence of characters beginning with the
469 To construct human-readable text, these should be intercepted with
471 and converted with one the functions described in
473 .Ss Man Abstract Syntax Tree
474 This AST is governed by the ontological rules dictated in
476 and derives its terminology accordingly.
478 The AST is composed of
480 nodes with element, root and text types as declared by the
483 Each node also provides its parse point (the
488 fields), its position in the tree (the
494 fields) and some type-specific data.
496 The tree itself is arranged according to the following normal form,
497 where capitalised non-terminals represent nodes.
499 .Bl -tag -width "ELEMENTXX" -compact
503 \(<- ELEMENT | TEXT | BLOCK
516 The only elements capable of nesting other elements are those with
517 next-line scope as documented in
519 .Ss Mdoc Abstract Syntax Tree
520 This AST is governed by the ontological
523 and derives its terminology accordingly.
525 elements described in
527 are described simply as
530 The AST is composed of
532 nodes with block, head, body, element, root and text types as declared
536 Each node also provides its parse point (the
541 fields), its position in the tree (the
548 fields) and some type-specific data, in particular, for nodes generated
549 from macros, the generating macro in the
553 The tree itself is arranged according to the following normal form,
554 where capitalised non-terminals represent nodes.
556 .Bl -tag -width "ELEMENTXX" -compact
560 \(<- BLOCK | ELEMENT | TEXT
562 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
568 \(<- mnode* [ENDBODY mnode*]
575 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
576 the BLOCK production: these refer to punctuation marks.
577 Furthermore, although a TEXT node will generally have a non-zero-length
578 string, in the specific case of
579 .Sq \&.Bd \-literal ,
580 an empty line will produce a zero-length string.
581 Multiple body parts are only found in invocations of
583 where a new body introduces a new phrase.
587 syntax tree accommodates for broken block structures as well.
588 The ENDBODY node is available to end the formatting associated
589 with a given block before the physical end of that block.
592 field, is of the BODY
596 as the BLOCK it is ending, and has a
598 field pointing to that BLOCK's BODY node.
599 It is an indirect child of that BODY node
600 and has no children of its own.
602 An ENDBODY node is generated when a block ends while one of its child
603 blocks is still open, like in the following example:
604 .Bd -literal -offset indent
611 This example results in the following block structure:
612 .Bd -literal -offset indent
617 BLOCK Bo, pending -> Ao
622 ENDBODY Ao, pending -> Ao
627 Here, the formatting of the
629 block extends from TEXT ao to TEXT ac,
630 while the formatting of the
632 block extends from TEXT bo to TEXT bc.
633 It renders as follows in
637 .Dl <ao [bo ac> bc] end
639 Support for badly-nested blocks is only provided for backward
640 compatibility with some older
643 Using badly-nested blocks is
644 .Em strongly discouraged ;
651 are unable to render them in any meaningful way.
652 Furthermore, behaviour when encountering badly-nested blocks is not
653 consistent across troff implementations, especially when using multiple
654 levels of badly-nested blocks.
658 .Xr mandoc_escape 3 ,
659 .Xr mandoc_headers 3 ,
660 .Xr mandoc_malloc 3 ,
674 library was written by
675 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
677 .An Ingo Schwarze Aq Mt schwarze@openbsd.org .