1 .\" $Id: mandoc.3,v 1.44 2018/12/30 00:49:55 schwarze Exp $
3 .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
18 .Dd $Mdocdate: December 30 2018 $
31 .Nd mandoc macro compiler library
37 .Fd "#define ASCII_NBRSP"
38 .Fd "#define ASCII_HYPH"
39 .Fd "#define ASCII_BREAK"
43 .Fa "enum mandoc_os oe_e"
48 .Fa "struct mparse *parse"
52 .Fa "const struct mparse *parse"
56 .Fa "struct mparse *parse"
57 .Fa "const char *fname"
61 .Fa "struct mparse *parse"
63 .Fa "const char *fname"
67 .Fa "struct mparse *parse"
69 .Ft struct roff_meta *
71 .Fa "struct mparse *parse"
77 .Fa "const struct roff_node *node"
82 .Vt extern const char * const * mdoc_argnames;
83 .Vt extern const char * const * mdoc_macronames;
87 .Vt extern const char * const * man_macronames;
93 manual into an abstract syntax tree (AST).
95 manuals are composed of
106 The following describes a general parse sequence:
109 initiate a parsing sequence with
125 retrieve the syntax tree with
128 if information about the validity of the input is needed, fetch it with
129 .Fn mparse_updaterc ;
131 iterate over parse nodes with starting from the
133 member of the returned
134 .Vt struct roff_meta ;
136 free all allocated memory with
142 and go back to step 2 to parse new files.
145 This section documents the functions, types, and variables available
148 with the exception of those documented in
154 .It Vt "enum mandocerr"
155 An error or warning message during parsing.
156 .It Vt "enum mandoclevel"
157 A classification of an
159 as regards system operation.
160 See the DIAGNOSTICS section in
162 regarding the meanings of the levels.
163 .It Vt "struct mparse"
164 An opaque pointer to a running parse sequence.
169 This may be used across parsed input if
171 is called between parses.
176 Obtain a text-only representation of a
177 .Vt struct roff_node ,
178 including text contained in its child nodes.
179 To be used on children of the
182 .Vt struct roff_meta .
183 When it is no longer needed, the pointer returned from
189 The arguments have the following effect:
190 .Bl -tag -offset 5n -width inttype
196 bit is set, only that parser is used.
197 Otherwise, the document type is automatically detected.
204 file inclusion requests are always honoured.
205 Otherwise, if the request is the only content in an input file,
206 only the file name is remembered, to be returned in the
209 .Vt struct roff_meta .
213 bit is set, parsing is aborted after the NAME section.
214 This is for example useful in
217 to quickly build minimal databases.
223 runs the validation functions before returning the syntax tree.
224 This is almost always required, except in certain debugging scenarios,
225 for example to dump unvalidated syntax trees.
227 Operating system to check base system conventions for.
229 .Dv MANDOC_OS_OTHER ,
230 the system is automatically detected from
236 A default string for the
239 macro, overriding the
241 preprocessor definition and the results of
248 The same parser may be used for multiple files so long as
250 is called between parses.
252 must be called to free the memory allocated by this function.
258 Free all memory allocated by
265 Dump a copy of the input to the standard output; used for
266 .Fl man T Ns Cm man .
272 Open the file for reading.
275 does not already end in
277 try again after appending
279 Save the information whether the file is zipped or not.
280 Return a file descriptor open for reading or -1 on failure.
289 Parse a file descriptor opened with
293 Pass the associated filename in
295 This function may be called multiple times with different parameters; however,
299 should be invoked between parses.
305 Reset a parser so that
313 Obtain the result of a parse.
321 .It Va man_macronames
322 The string representation of a
327 The string representation of an
329 macro argument as indexed by
330 .Vt "enum mdocargt" .
331 .It Va mdoc_macronames
332 The string representation of an
337 .Sh IMPLEMENTATION NOTES
338 This section consists of structural documentation for
342 syntax trees and strings.
343 .Ss Man and Mdoc Strings
344 Strings may be extracted from mdoc and man meta-data, or from text
345 nodes (MDOC_TEXT and MAN_TEXT, respectively).
346 These strings have special non-printing formatting cues embedded in the
347 text itself, as well as
349 escapes preserved from input.
350 Implementing systems will need to handle both situations to produce
352 In general, strings may be assumed to consist of 7-bit ASCII characters.
354 The following non-printing characters may be embedded in text strings:
357 A non-breaking space character.
361 A breakable zero-width space.
364 Escape characters are also passed verbatim into text strings.
365 An escape character is a sequence of characters beginning with the
368 To construct human-readable text, these should be intercepted with
370 and converted with one the functions described in
372 .Ss Man Abstract Syntax Tree
373 This AST is governed by the ontological rules dictated in
375 and derives its terminology accordingly.
377 The AST is composed of
379 nodes with element, root and text types as declared by the
382 Each node also provides its parse point (the
387 fields), its position in the tree (the
393 fields) and some type-specific data.
395 The tree itself is arranged according to the following normal form,
396 where capitalised non-terminals represent nodes.
398 .Bl -tag -width "ELEMENTXX" -compact
402 \(<- ELEMENT | TEXT | BLOCK
415 The only elements capable of nesting other elements are those with
416 next-line scope as documented in
418 .Ss Mdoc Abstract Syntax Tree
419 This AST is governed by the ontological
422 and derives its terminology accordingly.
424 elements described in
426 are described simply as
429 The AST is composed of
431 nodes with block, head, body, element, root and text types as declared
435 Each node also provides its parse point (the
440 fields), its position in the tree (the
447 fields) and some type-specific data, in particular, for nodes generated
448 from macros, the generating macro in the
452 The tree itself is arranged according to the following normal form,
453 where capitalised non-terminals represent nodes.
455 .Bl -tag -width "ELEMENTXX" -compact
459 \(<- BLOCK | ELEMENT | TEXT
461 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
467 \(<- mnode* [ENDBODY mnode*]
474 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
475 the BLOCK production: these refer to punctuation marks.
476 Furthermore, although a TEXT node will generally have a non-zero-length
477 string, in the specific case of
478 .Sq \&.Bd \-literal ,
479 an empty line will produce a zero-length string.
480 Multiple body parts are only found in invocations of
482 where a new body introduces a new phrase.
486 syntax tree accommodates for broken block structures as well.
487 The ENDBODY node is available to end the formatting associated
488 with a given block before the physical end of that block.
491 field, is of the BODY
495 as the BLOCK it is ending, and has a
497 field pointing to that BLOCK's BODY node.
498 It is an indirect child of that BODY node
499 and has no children of its own.
501 An ENDBODY node is generated when a block ends while one of its child
502 blocks is still open, like in the following example:
503 .Bd -literal -offset indent
510 This example results in the following block structure:
511 .Bd -literal -offset indent
516 BLOCK Bo, pending -> Ao
521 ENDBODY Ao, pending -> Ao
526 Here, the formatting of the
528 block extends from TEXT ao to TEXT ac,
529 while the formatting of the
531 block extends from TEXT bo to TEXT bc.
532 It renders as follows in
536 .Dl <ao [bo ac> bc] end
538 Support for badly-nested blocks is only provided for backward
539 compatibility with some older
542 Using badly-nested blocks is
543 .Em strongly discouraged ;
548 is unable to render them in any meaningful way.
549 Furthermore, behaviour when encountering badly-nested blocks is not
550 consistent across troff implementations, especially when using multiple
551 levels of badly-nested blocks.
555 .Xr mandoc_escape 3 ,
556 .Xr mandoc_headers 3 ,
557 .Xr mandoc_malloc 3 ,
571 library was written by
572 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
574 .An Ingo Schwarze Aq Mt schwarze@openbsd.org .