1 .\" $Id: mandoc.3,v 1.43 2018/12/14 01:18:25 schwarze Exp $
3 .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
18 .Dd $Mdocdate: December 14 2018 $
33 .Nd mandoc macro compiler library
39 .Fd "#define ASCII_NBRSP"
40 .Fd "#define ASCII_HYPH"
41 .Fd "#define ASCII_BREAK"
45 .Fa "enum mandoc_os oe_e"
50 .Fa "struct mparse *parse"
54 .Fa "const struct mparse *parse"
58 .Fa "struct mparse *parse"
59 .Fa "const char *fname"
63 .Fa "struct mparse *parse"
65 .Fa "const char *fname"
69 .Fa "struct mparse *parse"
73 .Fa "struct mparse *parse"
74 .Fa "struct roff_man **man"
81 .Fa "const struct roff_node *node"
86 .Vt extern const char * const * mdoc_argnames;
87 .Vt extern const char * const * mdoc_macronames;
90 .Fa "struct roff_man *mdoc"
95 .Vt extern const char * const * man_macronames;
98 .Fa "struct roff_man *man"
105 manual into an abstract syntax tree (AST).
107 manuals are composed of
111 and may be mixed with
118 The following describes a general parse sequence:
121 initiate a parsing sequence with
137 retrieve the syntax tree with
140 depending on whether the
142 member of the returned
154 if information about the validity of the input is needed, fetch it with
155 .Fn mparse_updaterc ;
157 iterate over parse nodes with starting from the
159 member of the returned
160 .Vt struct roff_man ;
162 free all allocated memory with
168 and go back to step 2 to parse new files.
171 This section documents the functions, types, and variables available
174 with the exception of those documented in
180 .It Vt "enum mandocerr"
181 An error or warning message during parsing.
182 .It Vt "enum mandoclevel"
183 A classification of an
185 as regards system operation.
186 See the DIAGNOSTICS section in
188 regarding the meanings of the levels.
189 .It Vt "struct mparse"
190 An opaque pointer to a running parse sequence.
195 This may be used across parsed input if
197 is called between parses.
202 Obtain a text-only representation of a
203 .Vt struct roff_node ,
204 including text contained in its child nodes.
205 To be used on children of the
208 .Vt struct roff_man .
209 When it is no longer needed, the pointer returned from
216 parse tree obtained with
225 parse tree obtained with
233 The arguments have the following effect:
234 .Bl -tag -offset 5n -width inttype
240 bit is set, only that parser is used.
241 Otherwise, the document type is automatically detected.
248 file inclusion requests are always honoured.
249 Otherwise, if the request is the only content in an input file,
250 only the file name is remembered, to be returned in the
257 bit is set, parsing is aborted after the NAME section.
258 This is for example useful in
261 to quickly build minimal databases.
263 Operating system to check base system conventions for.
265 .Dv MANDOC_OS_OTHER ,
266 the system is automatically detected from
272 A default string for the
275 macro, overriding the
277 preprocessor definition and the results of
284 The same parser may be used for multiple files so long as
286 is called between parses.
288 must be called to free the memory allocated by this function.
294 Free all memory allocated by
301 Dump a copy of the input to the standard output; used for
302 .Fl man T Ns Cm man .
308 Open the file for reading.
311 does not already end in
313 try again after appending
315 Save the information whether the file is zipped or not.
316 Return a file descriptor open for reading or -1 on failure.
325 Parse a file descriptor opened with
329 Pass the associated filename in
331 This function may be called multiple times with different parameters; however,
335 should be invoked between parses.
341 Reset a parser so that
349 Obtain the result of a parse.
350 One of the two pointers will be filled in.
358 .It Va man_macronames
359 The string representation of a
364 The string representation of an
366 macro argument as indexed by
367 .Vt "enum mdocargt" .
368 .It Va mdoc_macronames
369 The string representation of an
374 .Sh IMPLEMENTATION NOTES
375 This section consists of structural documentation for
379 syntax trees and strings.
380 .Ss Man and Mdoc Strings
381 Strings may be extracted from mdoc and man meta-data, or from text
382 nodes (MDOC_TEXT and MAN_TEXT, respectively).
383 These strings have special non-printing formatting cues embedded in the
384 text itself, as well as
386 escapes preserved from input.
387 Implementing systems will need to handle both situations to produce
389 In general, strings may be assumed to consist of 7-bit ASCII characters.
391 The following non-printing characters may be embedded in text strings:
394 A non-breaking space character.
398 A breakable zero-width space.
401 Escape characters are also passed verbatim into text strings.
402 An escape character is a sequence of characters beginning with the
405 To construct human-readable text, these should be intercepted with
407 and converted with one the functions described in
409 .Ss Man Abstract Syntax Tree
410 This AST is governed by the ontological rules dictated in
412 and derives its terminology accordingly.
414 The AST is composed of
416 nodes with element, root and text types as declared by the
419 Each node also provides its parse point (the
424 fields), its position in the tree (the
430 fields) and some type-specific data.
432 The tree itself is arranged according to the following normal form,
433 where capitalised non-terminals represent nodes.
435 .Bl -tag -width "ELEMENTXX" -compact
439 \(<- ELEMENT | TEXT | BLOCK
452 The only elements capable of nesting other elements are those with
453 next-line scope as documented in
455 .Ss Mdoc Abstract Syntax Tree
456 This AST is governed by the ontological
459 and derives its terminology accordingly.
461 elements described in
463 are described simply as
466 The AST is composed of
468 nodes with block, head, body, element, root and text types as declared
472 Each node also provides its parse point (the
477 fields), its position in the tree (the
484 fields) and some type-specific data, in particular, for nodes generated
485 from macros, the generating macro in the
489 The tree itself is arranged according to the following normal form,
490 where capitalised non-terminals represent nodes.
492 .Bl -tag -width "ELEMENTXX" -compact
496 \(<- BLOCK | ELEMENT | TEXT
498 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
504 \(<- mnode* [ENDBODY mnode*]
511 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
512 the BLOCK production: these refer to punctuation marks.
513 Furthermore, although a TEXT node will generally have a non-zero-length
514 string, in the specific case of
515 .Sq \&.Bd \-literal ,
516 an empty line will produce a zero-length string.
517 Multiple body parts are only found in invocations of
519 where a new body introduces a new phrase.
523 syntax tree accommodates for broken block structures as well.
524 The ENDBODY node is available to end the formatting associated
525 with a given block before the physical end of that block.
528 field, is of the BODY
532 as the BLOCK it is ending, and has a
534 field pointing to that BLOCK's BODY node.
535 It is an indirect child of that BODY node
536 and has no children of its own.
538 An ENDBODY node is generated when a block ends while one of its child
539 blocks is still open, like in the following example:
540 .Bd -literal -offset indent
547 This example results in the following block structure:
548 .Bd -literal -offset indent
553 BLOCK Bo, pending -> Ao
558 ENDBODY Ao, pending -> Ao
563 Here, the formatting of the
565 block extends from TEXT ao to TEXT ac,
566 while the formatting of the
568 block extends from TEXT bo to TEXT bc.
569 It renders as follows in
573 .Dl <ao [bo ac> bc] end
575 Support for badly-nested blocks is only provided for backward
576 compatibility with some older
579 Using badly-nested blocks is
580 .Em strongly discouraged ;
585 is unable to render them in any meaningful way.
586 Furthermore, behaviour when encountering badly-nested blocks is not
587 consistent across troff implementations, especially when using multiple
588 levels of badly-nested blocks.
592 .Xr mandoc_escape 3 ,
593 .Xr mandoc_headers 3 ,
594 .Xr mandoc_malloc 3 ,
608 library was written by
609 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
611 .An Ingo Schwarze Aq Mt schwarze@openbsd.org .