1 .\" $Id: mandoc.3,v 1.38 2017/01/09 01:37:03 schwarze Exp $
3 .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
18 .Dd $Mdocdate: January 9 2017 $
39 .Nd mandoc macro compiler library
44 .Fd "#define ASCII_NBRSP"
45 .Fd "#define ASCII_HYPH"
46 .Fd "#define ASCII_BREAK"
50 .Fa "enum mandoclevel wlevel"
56 .Fa "enum mandocerr errtype"
57 .Fa "enum mandoclevel level"
58 .Fa "const char *file"
65 .Fa "struct mparse *parse"
69 .Fa "const struct mparse *parse"
73 .Fa "struct mparse *parse"
77 .Fa "struct mparse *parse"
78 .Fa "const char *fname"
80 .Ft "enum mandoclevel"
82 .Fa "struct mparse *parse"
84 .Fa "const char *fname"
88 .Fa "struct mparse *parse"
92 .Fa "struct mparse *parse"
93 .Fa "struct roff_man **man"
102 .Fa "enum mandoclevel"
106 .Fa "struct mparse *parse"
107 .Fa "enum mandoclevel *rc"
113 .Fa "const struct roff_node *node"
118 .Vt extern const char * const * mdoc_argnames;
119 .Vt extern const char * const * mdoc_macronames;
122 .Fa "struct roff_man *mdoc"
127 .Vt extern const char * const * man_macronames;
128 .Ft "const struct mparse *"
130 .Fa "const struct roff_man *man"
134 .Fa "struct roff_man *man"
141 manual into an abstract syntax tree (AST).
143 manuals are composed of
147 and may be mixed with
154 The following describes a general parse sequence:
157 initiate a parsing sequence with
173 retrieve the syntax tree with
176 depending on whether the
178 member of the returned
190 if information about the validity of the input is needed, fetch it with
191 .Fn mparse_updaterc ;
193 iterate over parse nodes with starting from the
195 member of the returned
196 .Vt struct roff_man ;
198 free all allocated memory with
204 and go back to step 2 to parse new files.
207 This section documents the functions, types, and variables available
210 with the exception of those documented in
216 .It Vt "enum mandocerr"
217 An error or warning message during parsing.
218 .It Vt "enum mandoclevel"
219 A classification of an
221 as regards system operation.
222 See the DIAGNOSTICS section in
224 regarding the meanings of the levels.
225 .It Vt "struct mparse"
226 An opaque pointer to a running parse sequence.
231 This may be used across parsed input if
233 is called between parses.
235 A prototype for a function to handle error and warning
236 messages emitted by the parser.
241 Obtain a text-only representation of a
242 .Vt struct roff_node ,
243 including text contained in its child nodes.
244 To be used on children of the
247 .Vt struct roff_man .
248 When it is no longer needed, the pointer returned from
253 Get the parser used for the current output.
261 parse tree obtained with
270 parse tree obtained with
278 The arguments have the following effect:
279 .Bl -tag -offset 5n -width inttype
285 bit is set, only that parser is used.
286 Otherwise, the document type is automatically detected.
293 file inclusion requests are always honoured.
294 Otherwise, if the request is the only content in an input file,
295 only the file name is remembered, to be returned in the
302 bit is set, parsing is aborted after the NAME section.
303 This is for example useful in
306 to quickly build minimal databases.
309 .Dv MANDOCLEVEL_BADARG ,
310 .Dv MANDOCLEVEL_ERROR ,
312 .Dv MANDOCLEVEL_WARNING .
313 Messages below the selected level will be suppressed.
315 A callback function to handle errors and warnings.
319 If printing of error messages is not desired,
323 A default string for the
326 macro, overriding the
328 preprocessor definition and the results of
335 The same parser may be used for multiple files so long as
337 is called between parses.
339 must be called to free the memory allocated by this function.
345 Free all memory allocated by
351 .It Fn mparse_getkeep
352 Acquire the keep buffer.
353 Must follow a call of
360 Instruct the parser to retain a copy of its parsed input.
361 This can be acquired with subsequent
369 Open the file for reading.
372 does not already end in
374 try again after appending
376 Save the information whether the file is zipped or not.
377 Return a file descriptor open for reading or -1 on failure.
386 Parse a file descriptor opened with
390 Pass the associated filename in
392 This function may be called multiple times with different parameters; however,
396 should be invoked between parses.
402 Reset a parser so that
410 Obtain the result of a parse.
411 One of the two pointers will be filled in.
416 .It Fn mparse_strerror
417 Return a statically-allocated string representation of an error code.
422 .It Fn mparse_strlevel
423 Return a statically-allocated string representation of a level code.
428 .It Fn mparse_updaterc
429 If the highest warning or error level that occurred during the current
436 This is useful after calling
447 .It Va man_macronames
448 The string representation of a
453 The string representation of an
455 macro argument as indexed by
456 .Vt "enum mdocargt" .
457 .It Va mdoc_macronames
458 The string representation of an
463 .Sh IMPLEMENTATION NOTES
464 This section consists of structural documentation for
468 syntax trees and strings.
469 .Ss Man and Mdoc Strings
470 Strings may be extracted from mdoc and man meta-data, or from text
471 nodes (MDOC_TEXT and MAN_TEXT, respectively).
472 These strings have special non-printing formatting cues embedded in the
473 text itself, as well as
475 escapes preserved from input.
476 Implementing systems will need to handle both situations to produce
478 In general, strings may be assumed to consist of 7-bit ASCII characters.
480 The following non-printing characters may be embedded in text strings:
483 A non-breaking space character.
487 A breakable zero-width space.
490 Escape characters are also passed verbatim into text strings.
491 An escape character is a sequence of characters beginning with the
494 To construct human-readable text, these should be intercepted with
496 and converted with one the functions described in
498 .Ss Man Abstract Syntax Tree
499 This AST is governed by the ontological rules dictated in
501 and derives its terminology accordingly.
503 The AST is composed of
505 nodes with element, root and text types as declared by the
508 Each node also provides its parse point (the
513 fields), its position in the tree (the
519 fields) and some type-specific data.
521 The tree itself is arranged according to the following normal form,
522 where capitalised non-terminals represent nodes.
524 .Bl -tag -width "ELEMENTXX" -compact
528 \(<- ELEMENT | TEXT | BLOCK
541 The only elements capable of nesting other elements are those with
542 next-line scope as documented in
544 .Ss Mdoc Abstract Syntax Tree
545 This AST is governed by the ontological
548 and derives its terminology accordingly.
550 elements described in
552 are described simply as
555 The AST is composed of
557 nodes with block, head, body, element, root and text types as declared
561 Each node also provides its parse point (the
566 fields), its position in the tree (the
573 fields) and some type-specific data, in particular, for nodes generated
574 from macros, the generating macro in the
578 The tree itself is arranged according to the following normal form,
579 where capitalised non-terminals represent nodes.
581 .Bl -tag -width "ELEMENTXX" -compact
585 \(<- BLOCK | ELEMENT | TEXT
587 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
593 \(<- mnode* [ENDBODY mnode*]
600 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
601 the BLOCK production: these refer to punctuation marks.
602 Furthermore, although a TEXT node will generally have a non-zero-length
603 string, in the specific case of
604 .Sq \&.Bd \-literal ,
605 an empty line will produce a zero-length string.
606 Multiple body parts are only found in invocations of
608 where a new body introduces a new phrase.
612 syntax tree accommodates for broken block structures as well.
613 The ENDBODY node is available to end the formatting associated
614 with a given block before the physical end of that block.
617 field, is of the BODY
621 as the BLOCK it is ending, and has a
623 field pointing to that BLOCK's BODY node.
624 It is an indirect child of that BODY node
625 and has no children of its own.
627 An ENDBODY node is generated when a block ends while one of its child
628 blocks is still open, like in the following example:
629 .Bd -literal -offset indent
636 This example results in the following block structure:
637 .Bd -literal -offset indent
642 BLOCK Bo, pending -> Ao
647 ENDBODY Ao, pending -> Ao
652 Here, the formatting of the
654 block extends from TEXT ao to TEXT ac,
655 while the formatting of the
657 block extends from TEXT bo to TEXT bc.
658 It renders as follows in
662 .Dl <ao [bo ac> bc] end
664 Support for badly-nested blocks is only provided for backward
665 compatibility with some older
668 Using badly-nested blocks is
669 .Em strongly discouraged ;
676 are unable to render them in any meaningful way.
677 Furthermore, behaviour when encountering badly-nested blocks is not
678 consistent across troff implementations, especially when using multiple
679 levels of badly-nested blocks.
683 .Xr mandoc_escape 3 ,
684 .Xr mandoc_headers 3 ,
685 .Xr mandoc_malloc 3 ,
699 library was written by
700 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
702 .An Ingo Schwarze Aq Mt schwarze@openbsd.org .