1 .\" $Id: mandoc.3,v 1.42 2018/08/23 19:33:27 schwarze Exp $
3 .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
18 .Dd $Mdocdate: August 23 2018 $
37 .Nd mandoc macro compiler library
42 .Fd "#define ASCII_NBRSP"
43 .Fd "#define ASCII_HYPH"
44 .Fd "#define ASCII_BREAK"
48 .Fa "enum mandocerr mmin"
50 .Fa "enum mandoc_os oe_e"
55 .Fa "enum mandocerr errtype"
56 .Fa "enum mandoclevel level"
57 .Fa "const char *file"
64 .Fa "struct mparse *parse"
68 .Fa "const struct mparse *parse"
72 .Fa "struct mparse *parse"
73 .Fa "const char *fname"
75 .Ft "enum mandoclevel"
77 .Fa "struct mparse *parse"
79 .Fa "const char *fname"
83 .Fa "struct mparse *parse"
87 .Fa "struct mparse *parse"
88 .Fa "struct roff_man **man"
97 .Fa "enum mandoclevel"
101 .Fa "struct mparse *parse"
102 .Fa "enum mandoclevel *rc"
108 .Fa "const struct roff_node *node"
113 .Vt extern const char * const * mdoc_argnames;
114 .Vt extern const char * const * mdoc_macronames;
117 .Fa "struct roff_man *mdoc"
122 .Vt extern const char * const * man_macronames;
125 .Fa "struct roff_man *man"
132 manual into an abstract syntax tree (AST).
134 manuals are composed of
138 and may be mixed with
145 The following describes a general parse sequence:
148 initiate a parsing sequence with
164 retrieve the syntax tree with
167 depending on whether the
169 member of the returned
181 if information about the validity of the input is needed, fetch it with
182 .Fn mparse_updaterc ;
184 iterate over parse nodes with starting from the
186 member of the returned
187 .Vt struct roff_man ;
189 free all allocated memory with
195 and go back to step 2 to parse new files.
198 This section documents the functions, types, and variables available
201 with the exception of those documented in
207 .It Vt "enum mandocerr"
208 An error or warning message during parsing.
209 .It Vt "enum mandoclevel"
210 A classification of an
212 as regards system operation.
213 See the DIAGNOSTICS section in
215 regarding the meanings of the levels.
216 .It Vt "struct mparse"
217 An opaque pointer to a running parse sequence.
222 This may be used across parsed input if
224 is called between parses.
226 A prototype for a function to handle error and warning
227 messages emitted by the parser.
232 Obtain a text-only representation of a
233 .Vt struct roff_node ,
234 including text contained in its child nodes.
235 To be used on children of the
238 .Vt struct roff_man .
239 When it is no longer needed, the pointer returned from
246 parse tree obtained with
255 parse tree obtained with
263 The arguments have the following effect:
264 .Bl -tag -offset 5n -width inttype
270 bit is set, only that parser is used.
271 Otherwise, the document type is automatically detected.
278 file inclusion requests are always honoured.
279 Otherwise, if the request is the only content in an input file,
280 only the file name is remembered, to be returned in the
287 bit is set, parsing is aborted after the NAME section.
288 This is for example useful in
291 to quickly build minimal databases.
295 .Dv MANDOCERR_STYLE ,
296 .Dv MANDOCERR_WARNING ,
297 .Dv MANDOCERR_ERROR ,
298 .Dv MANDOCERR_UNSUPP ,
301 Messages below the selected level will be suppressed.
303 A callback function to handle errors and warnings.
307 If printing of error messages is not desired,
311 Operating system to check base system conventions for.
313 .Dv MANDOC_OS_OTHER ,
314 the system is automatically detected from
320 A default string for the
323 macro, overriding the
325 preprocessor definition and the results of
332 The same parser may be used for multiple files so long as
334 is called between parses.
336 must be called to free the memory allocated by this function.
342 Free all memory allocated by
349 Dump a copy of the input to the standard output; used for
350 .Fl man T Ns Cm man .
356 Open the file for reading.
359 does not already end in
361 try again after appending
363 Save the information whether the file is zipped or not.
364 Return a file descriptor open for reading or -1 on failure.
373 Parse a file descriptor opened with
377 Pass the associated filename in
379 This function may be called multiple times with different parameters; however,
383 should be invoked between parses.
389 Reset a parser so that
397 Obtain the result of a parse.
398 One of the two pointers will be filled in.
403 .It Fn mparse_strerror
404 Return a statically-allocated string representation of an error code.
409 .It Fn mparse_strlevel
410 Return a statically-allocated string representation of a level code.
415 .It Fn mparse_updaterc
416 If the highest warning or error level that occurred during the current
423 This is useful after calling
434 .It Va man_macronames
435 The string representation of a
440 The string representation of an
442 macro argument as indexed by
443 .Vt "enum mdocargt" .
444 .It Va mdoc_macronames
445 The string representation of an
450 .Sh IMPLEMENTATION NOTES
451 This section consists of structural documentation for
455 syntax trees and strings.
456 .Ss Man and Mdoc Strings
457 Strings may be extracted from mdoc and man meta-data, or from text
458 nodes (MDOC_TEXT and MAN_TEXT, respectively).
459 These strings have special non-printing formatting cues embedded in the
460 text itself, as well as
462 escapes preserved from input.
463 Implementing systems will need to handle both situations to produce
465 In general, strings may be assumed to consist of 7-bit ASCII characters.
467 The following non-printing characters may be embedded in text strings:
470 A non-breaking space character.
474 A breakable zero-width space.
477 Escape characters are also passed verbatim into text strings.
478 An escape character is a sequence of characters beginning with the
481 To construct human-readable text, these should be intercepted with
483 and converted with one the functions described in
485 .Ss Man Abstract Syntax Tree
486 This AST is governed by the ontological rules dictated in
488 and derives its terminology accordingly.
490 The AST is composed of
492 nodes with element, root and text types as declared by the
495 Each node also provides its parse point (the
500 fields), its position in the tree (the
506 fields) and some type-specific data.
508 The tree itself is arranged according to the following normal form,
509 where capitalised non-terminals represent nodes.
511 .Bl -tag -width "ELEMENTXX" -compact
515 \(<- ELEMENT | TEXT | BLOCK
528 The only elements capable of nesting other elements are those with
529 next-line scope as documented in
531 .Ss Mdoc Abstract Syntax Tree
532 This AST is governed by the ontological
535 and derives its terminology accordingly.
537 elements described in
539 are described simply as
542 The AST is composed of
544 nodes with block, head, body, element, root and text types as declared
548 Each node also provides its parse point (the
553 fields), its position in the tree (the
560 fields) and some type-specific data, in particular, for nodes generated
561 from macros, the generating macro in the
565 The tree itself is arranged according to the following normal form,
566 where capitalised non-terminals represent nodes.
568 .Bl -tag -width "ELEMENTXX" -compact
572 \(<- BLOCK | ELEMENT | TEXT
574 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
580 \(<- mnode* [ENDBODY mnode*]
587 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
588 the BLOCK production: these refer to punctuation marks.
589 Furthermore, although a TEXT node will generally have a non-zero-length
590 string, in the specific case of
591 .Sq \&.Bd \-literal ,
592 an empty line will produce a zero-length string.
593 Multiple body parts are only found in invocations of
595 where a new body introduces a new phrase.
599 syntax tree accommodates for broken block structures as well.
600 The ENDBODY node is available to end the formatting associated
601 with a given block before the physical end of that block.
604 field, is of the BODY
608 as the BLOCK it is ending, and has a
610 field pointing to that BLOCK's BODY node.
611 It is an indirect child of that BODY node
612 and has no children of its own.
614 An ENDBODY node is generated when a block ends while one of its child
615 blocks is still open, like in the following example:
616 .Bd -literal -offset indent
623 This example results in the following block structure:
624 .Bd -literal -offset indent
629 BLOCK Bo, pending -> Ao
634 ENDBODY Ao, pending -> Ao
639 Here, the formatting of the
641 block extends from TEXT ao to TEXT ac,
642 while the formatting of the
644 block extends from TEXT bo to TEXT bc.
645 It renders as follows in
649 .Dl <ao [bo ac> bc] end
651 Support for badly-nested blocks is only provided for backward
652 compatibility with some older
655 Using badly-nested blocks is
656 .Em strongly discouraged ;
661 is unable to render them in any meaningful way.
662 Furthermore, behaviour when encountering badly-nested blocks is not
663 consistent across troff implementations, especially when using multiple
664 levels of badly-nested blocks.
668 .Xr mandoc_escape 3 ,
669 .Xr mandoc_headers 3 ,
670 .Xr mandoc_malloc 3 ,
684 library was written by
685 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
687 .An Ingo Schwarze Aq Mt schwarze@openbsd.org .