]> git.cameronkatri.com Git - mandoc.git/blob - mandoc.3
Note where these functions are declared and implemented.
[mandoc.git] / mandoc.3
1 .\" $Id: mandoc.3,v 1.18 2013/06/02 03:48:26 schwarze Exp $
2 .\"
3 .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010 Ingo Schwarze <schwarze@openbsd.org>
5 .\"
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
9 .\"
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17 .\"
18 .Dd $Mdocdate: June 2 2013 $
19 .Dt MANDOC 3
20 .Os
21 .Sh NAME
22 .Nm mandoc ,
23 .Nm mandoc_escape ,
24 .Nm man_meta ,
25 .Nm man_mparse ,
26 .Nm man_node ,
27 .Nm mchars_alloc ,
28 .Nm mchars_free ,
29 .Nm mchars_num2char ,
30 .Nm mchars_num2uc ,
31 .Nm mchars_spec2cp ,
32 .Nm mchars_spec2str ,
33 .Nm mdoc_meta ,
34 .Nm mdoc_node ,
35 .Nm mparse_alloc ,
36 .Nm mparse_free ,
37 .Nm mparse_getkeep ,
38 .Nm mparse_keep ,
39 .Nm mparse_readfd ,
40 .Nm mparse_reset ,
41 .Nm mparse_result ,
42 .Nm mparse_strerror ,
43 .Nm mparse_strlevel
44 .Nd mandoc macro compiler library
45 .Sh LIBRARY
46 .Lb mandoc
47 .Sh SYNOPSIS
48 .In man.h
49 .In mdoc.h
50 .In mandoc.h
51 .Ft "enum mandoc_esc"
52 .Fo mandoc_escape
53 .Fa "const char **end"
54 .Fa "const char **start"
55 .Fa "int *sz"
56 .Fc
57 .Ft "const struct man_meta *"
58 .Fo man_meta
59 .Fa "const struct man *man"
60 .Fc
61 .Ft "const struct mparse *"
62 .Fo man_mparse
63 .Fa "const struct man *man"
64 .Fc
65 .Ft "const struct man_node *"
66 .Fo man_node
67 .Fa "const struct man *man"
68 .Fc
69 .Ft "struct mchars *"
70 .Fn mchars_alloc
71 .Ft void
72 .Fn mchars_free "struct mchars *p"
73 .Ft char
74 .Fn mchars_num2char "const char *cp" "size_t sz"
75 .Ft int
76 .Fn mchars_num2uc "const char *cp" "size_t sz"
77 .Ft "const char *"
78 .Fo mchars_spec2str
79 .Fa "const struct mchars *p"
80 .Fa "const char *cp"
81 .Fa "size_t sz"
82 .Fa "size_t *rsz"
83 .Fc
84 .Ft int
85 .Fo mchars_spec2cp
86 .Fa "const struct mchars *p"
87 .Fa "const char *cp"
88 .Fa "size_t sz"
89 .Ft "const char *"
90 .Fc
91 .Ft "const struct mdoc_meta *"
92 .Fo mdoc_meta
93 .Fa "const struct mdoc *mdoc"
94 .Fc
95 .Ft "const struct mdoc_node *"
96 .Fo mdoc_node
97 .Fa "const struct mdoc *mdoc"
98 .Fc
99 .Ft void
100 .Fo mparse_alloc
101 .Fa "enum mparset type"
102 .Fa "enum mandoclevel wlevel"
103 .Fa "mandocmsg msg"
104 .Fa "void *msgarg"
105 .Fc
106 .Ft void
107 .Fo mparse_free
108 .Fa "struct mparse *parse"
109 .Fc
110 .Ft void
111 .Fo mparse_getkeep
112 .Fa "const struct mparse *parse"
113 .Fc
114 .Ft void
115 .Fo mparse_keep
116 .Fa "struct mparse *parse"
117 .Fc
118 .Ft "enum mandoclevel"
119 .Fo mparse_readfd
120 .Fa "struct mparse *parse"
121 .Fa "int fd"
122 .Fa "const char *fname"
123 .Fc
124 .Ft void
125 .Fo mparse_reset
126 .Fa "struct mparse *parse"
127 .Fc
128 .Ft void
129 .Fo mparse_result
130 .Fa "struct mparse *parse"
131 .Fa "struct mdoc **mdoc"
132 .Fa "struct man **man"
133 .Fc
134 .Ft "const char *"
135 .Fo mparse_strerror
136 .Fa "enum mandocerr"
137 .Fc
138 .Ft "const char *"
139 .Fo mparse_strlevel
140 .Fa "enum mandoclevel"
141 .Fc
142 .Vt extern const char * const * man_macronames;
143 .Vt extern const char * const * mdoc_argnames;
144 .Vt extern const char * const * mdoc_macronames;
145 .Fd "#define ASCII_NBRSP"
146 .Fd "#define ASCII_HYPH"
147 .Sh DESCRIPTION
148 The
149 .Nm mandoc
150 library parses a
151 .Ux
152 manual into an abstract syntax tree (AST).
153 .Ux
154 manuals are composed of
155 .Xr mdoc 7
156 or
157 .Xr man 7 ,
158 and may be mixed with
159 .Xr roff 7 ,
160 .Xr tbl 7 ,
161 and
162 .Xr eqn 7
163 invocations.
164 .Pp
165 The following describes a general parse sequence:
166 .Bl -enum
167 .It
168 initiate a parsing sequence with
169 .Fn mparse_alloc ;
170 .It
171 parse files or file descriptors with
172 .Fn mparse_readfd ;
173 .It
174 retrieve a parsed syntax tree, if the parse was successful, with
175 .Fn mparse_result ;
176 .It
177 iterate over parse nodes with
178 .Fn mdoc_node
179 or
180 .Fn man_node ;
181 .It
182 free all allocated memory with
183 .Fn mparse_free ,
184 or invoke
185 .Fn mparse_reset
186 and parse new files.
187 .El
188 .Pp
189 The
190 .Nm
191 library also contains routines for translating character strings into glyphs
192 .Pq see Fn mchars_alloc
193 and parsing escape sequences from strings
194 .Pq see Fn mandoc_escape .
195 .Sh REFERENCE
196 This section documents the functions, types, and variables available
197 via
198 .In mandoc.h .
199 .Ss Types
200 .Bl -ohang
201 .It Vt "enum mandoc_esc"
202 An escape sequence classification.
203 .It Vt "enum mandocerr"
204 A fatal error, error, or warning message during parsing.
205 .It Vt "enum mandoclevel"
206 A classification of an
207 .Vt "enum mandoclevel"
208 as regards system operation.
209 .It Vt "struct mchars"
210 An opaque pointer to an object allowing for translation between
211 character strings and glyphs.
212 See
213 .Fn mchars_alloc .
214 .It Vt "enum mparset"
215 The type of parser when reading input.
216 This should usually be
217 .Dv MPARSE_AUTO
218 for auto-detection.
219 .It Vt "struct mparse"
220 An opaque pointer to a running parse sequence.
221 Created with
222 .Fn mparse_alloc
223 and freed with
224 .Fn mparse_free .
225 This may be used across parsed input if
226 .Fn mparse_reset
227 is called between parses.
228 .It Vt "mandocmsg"
229 A prototype for a function to handle fatal error, error, and warning
230 messages emitted by the parser.
231 .El
232 .Ss Functions
233 .Bl -ohang
234 .It Fn mandoc_escape
235 Scan an escape sequence, i.e., a character string beginning with
236 .Sq \e .
237 Pass a pointer to the character after the
238 .Sq \e
239 as
240 .Va end ;
241 it will be set to the supremum of the parsed escape sequence unless
242 returning
243 .Dv ESCAPE_ERROR ,
244 in which case the string is bogus and should be
245 thrown away.
246 If not
247 .Dv ESCAPE_ERROR
248 or
249 .Dv ESCAPE_IGNORE ,
250 .Va start
251 is set to the first relevant character of the substring (font, glyph,
252 whatever) of length
253 .Va sz .
254 Both
255 .Va start
256 and
257 .Va sz
258 may be
259 .Dv NULL .
260 Declared in
261 .In mandoc.h ,
262 implemented in
263 .Pa mandoc.c .
264 .It Fn man_meta
265 Obtain the meta-data of a successful parse.
266 This may only be used on a pointer returned by
267 .Fn mparse_result .
268 Declared in
269 .In man.h ,
270 implemented in
271 .Pa man.c .
272 .It Fn man_mparse
273 Get the parser used for the current output.
274 Declared in
275 .In man.h ,
276 implemented in
277 .Pa man.c .
278 .It Fn man_node
279 Obtain the root node of a successful parse.
280 This may only be used on a pointer returned by
281 .Fn mparse_result .
282 Declared in
283 .In man.h ,
284 implemented in
285 .Pa man.c .
286 .It Fn mchars_alloc
287 Allocate an
288 .Vt "struct mchars *"
289 object for translating special characters into glyphs.
290 See
291 .Xr mandoc_char 7
292 for an overview of special characters.
293 The object must be freed with
294 .Fn mchars_free .
295 Declared in
296 .In mandoc.h ,
297 implemented in
298 .Pa chars.c .
299 .It Fn mchars_free
300 Free an object created with
301 .Fn mchars_alloc .
302 Declared in
303 .In mandoc.h ,
304 implemented in
305 .Pa chars.c .
306 .It Fn mchars_num2char
307 Convert a character index (e.g., the \eN\(aq\(aq escape) into a
308 printable ASCII character.
309 Returns \e0 (the nil character) if the input sequence is malformed.
310 Declared in
311 .In mandoc.h ,
312 implemented in
313 .Pa chars.c .
314 .It Fn mchars_num2uc
315 Convert a hexadecimal character index (e.g., the \e[uNNNN] escape) into
316 a Unicode codepoint.
317 Returns \e0 (the nil character) if the input sequence is malformed.
318 Declared in
319 .In mandoc.h ,
320 implemented in
321 .Pa chars.c .
322 .It Fn mchars_spec2cp
323 Convert a special character into a valid Unicode codepoint.
324 Returns \-1 on failure or a non-zero Unicode codepoint on success.
325 Declared in
326 .In mandoc.h ,
327 implemented in
328 .Pa chars.c .
329 .It Fn mchars_spec2str
330 Convert a special character into an ASCII string.
331 Returns
332 .Dv NULL
333 on failure.
334 Declared in
335 .In mandoc.h ,
336 implemented in
337 .Pa chars.c .
338 .It Fn mdoc_meta
339 Obtain the meta-data of a successful parse.
340 This may only be used on a pointer returned by
341 .Fn mparse_result .
342 Declared in
343 .In mdoc.h ,
344 implemented in
345 .Pa mdoc.c .
346 .It Fn mdoc_node
347 Obtain the root node of a successful parse.
348 This may only be used on a pointer returned by
349 .Fn mparse_result .
350 Declared in
351 .In mdoc.h ,
352 implemented in
353 .Pa mdoc.c .
354 .It Fn mparse_alloc
355 Allocate a parser.
356 The same parser may be used for multiple files so long as
357 .Fn mparse_reset
358 is called between parses.
359 .Fn mparse_free
360 must be called to free the memory allocated by this function.
361 Declared in
362 .In mandoc.h ,
363 implemented in
364 .Pa read.c .
365 .It Fn mparse_free
366 Free all memory allocated by
367 .Fn mparse_alloc .
368 Declared in
369 .In mandoc.h ,
370 implemented in
371 .Pa read.c .
372 .It Fn mparse_getkeep
373 Acquire the keep buffer.
374 Must follow a call of
375 .Fn mparse_keep .
376 Declared in
377 .In mandoc.h ,
378 implemented in
379 .Pa read.c .
380 .It Fn mparse_keep
381 Instruct the parser to retain a copy of its parsed input.
382 This can be acquired with subsequent
383 .Fn mparse_getkeep
384 calls.
385 Declared in
386 .In mandoc.h ,
387 implemented in
388 .Pa read.c .
389 .It Fn mparse_readfd
390 Parse a file or file descriptor.
391 If
392 .Va fd
393 is -1,
394 .Va fname
395 is opened for reading.
396 Otherwise,
397 .Va fname
398 is assumed to be the name associated with
399 .Va fd .
400 This may be called multiple times with different parameters; however,
401 .Fn mparse_reset
402 should be invoked between parses.
403 Declared in
404 .In mandoc.h ,
405 implemented in
406 .Pa read.c .
407 .It Fn mparse_reset
408 Reset a parser so that
409 .Fn mparse_readfd
410 may be used again.
411 Declared in
412 .In mandoc.h ,
413 implemented in
414 .Pa read.c .
415 .It Fn mparse_result
416 Obtain the result of a parse.
417 Only successful parses
418 .Po
419 i.e., those where
420 .Fn mparse_readfd
421 returned less than MANDOCLEVEL_FATAL
422 .Pc
423 should invoke this function, in which case one of the two pointers will
424 be filled in.
425 Declared in
426 .In mandoc.h ,
427 implemented in
428 .Pa read.c .
429 .It Fn mparse_strerror
430 Return a statically-allocated string representation of an error code.
431 Declared in
432 .In mandoc.h ,
433 implemented in
434 .Pa read.c .
435 .It Fn mparse_strlevel
436 Return a statically-allocated string representation of a level code.
437 Declared in
438 .In mandoc.h ,
439 implemented in
440 .Pa read.c .
441 .El
442 .Ss Variables
443 .Bl -ohang
444 .It Va man_macronames
445 The string representation of a man macro as indexed by
446 .Vt "enum mant" .
447 .It Va mdoc_argnames
448 The string representation of a mdoc macro argument as indexed by
449 .Vt "enum mdocargt" .
450 .It Va mdoc_macronames
451 The string representation of a mdoc macro as indexed by
452 .Vt "enum mdoct" .
453 .El
454 .Sh IMPLEMENTATION NOTES
455 This section consists of structural documentation for
456 .Xr mdoc 7
457 and
458 .Xr man 7
459 syntax trees and strings.
460 .Ss Man and Mdoc Strings
461 Strings may be extracted from mdoc and man meta-data, or from text
462 nodes (MDOC_TEXT and MAN_TEXT, respectively).
463 These strings have special non-printing formatting cues embedded in the
464 text itself, as well as
465 .Xr roff 7
466 escapes preserved from input.
467 Implementing systems will need to handle both situations to produce
468 human-readable text.
469 In general, strings may be assumed to consist of 7-bit ASCII characters.
470 .Pp
471 The following non-printing characters may be embedded in text strings:
472 .Bl -tag -width Ds
473 .It Dv ASCII_NBRSP
474 A non-breaking space character.
475 .It Dv ASCII_HYPH
476 A soft hyphen.
477 .El
478 .Pp
479 Escape characters are also passed verbatim into text strings.
480 An escape character is a sequence of characters beginning with the
481 backslash
482 .Pq Sq \e .
483 To construct human-readable text, these should be intercepted with
484 .Fn mandoc_escape
485 and converted with one of
486 .Fn mchars_num2char ,
487 .Fn mchars_spec2str ,
488 and so on.
489 .Ss Man Abstract Syntax Tree
490 This AST is governed by the ontological rules dictated in
491 .Xr man 7
492 and derives its terminology accordingly.
493 .Pp
494 The AST is composed of
495 .Vt struct man_node
496 nodes with element, root and text types as declared by the
497 .Va type
498 field.
499 Each node also provides its parse point (the
500 .Va line ,
501 .Va sec ,
502 and
503 .Va pos
504 fields), its position in the tree (the
505 .Va parent ,
506 .Va child ,
507 .Va next
508 and
509 .Va prev
510 fields) and some type-specific data.
511 .Pp
512 The tree itself is arranged according to the following normal form,
513 where capitalised non-terminals represent nodes.
514 .Pp
515 .Bl -tag -width "ELEMENTXX" -compact
516 .It ROOT
517 \(<- mnode+
518 .It mnode
519 \(<- ELEMENT | TEXT | BLOCK
520 .It BLOCK
521 \(<- HEAD BODY
522 .It HEAD
523 \(<- mnode*
524 .It BODY
525 \(<- mnode*
526 .It ELEMENT
527 \(<- ELEMENT | TEXT*
528 .It TEXT
529 \(<- [[:ascii:]]*
530 .El
531 .Pp
532 The only elements capable of nesting other elements are those with
533 next-lint scope as documented in
534 .Xr man 7 .
535 .Ss Mdoc Abstract Syntax Tree
536 This AST is governed by the ontological
537 rules dictated in
538 .Xr mdoc 7
539 and derives its terminology accordingly.
540 .Qq In-line
541 elements described in
542 .Xr mdoc 7
543 are described simply as
544 .Qq elements .
545 .Pp
546 The AST is composed of
547 .Vt struct mdoc_node
548 nodes with block, head, body, element, root and text types as declared
549 by the
550 .Va type
551 field.
552 Each node also provides its parse point (the
553 .Va line ,
554 .Va sec ,
555 and
556 .Va pos
557 fields), its position in the tree (the
558 .Va parent ,
559 .Va child ,
560 .Va nchild ,
561 .Va next
562 and
563 .Va prev
564 fields) and some type-specific data, in particular, for nodes generated
565 from macros, the generating macro in the
566 .Va tok
567 field.
568 .Pp
569 The tree itself is arranged according to the following normal form,
570 where capitalised non-terminals represent nodes.
571 .Pp
572 .Bl -tag -width "ELEMENTXX" -compact
573 .It ROOT
574 \(<- mnode+
575 .It mnode
576 \(<- BLOCK | ELEMENT | TEXT
577 .It BLOCK
578 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
579 .It ELEMENT
580 \(<- TEXT*
581 .It HEAD
582 \(<- mnode*
583 .It BODY
584 \(<- mnode* [ENDBODY mnode*]
585 .It TAIL
586 \(<- mnode*
587 .It TEXT
588 \(<- [[:ascii:]]*
589 .El
590 .Pp
591 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
592 the BLOCK production: these refer to punctuation marks.
593 Furthermore, although a TEXT node will generally have a non-zero-length
594 string, in the specific case of
595 .Sq \&.Bd \-literal ,
596 an empty line will produce a zero-length string.
597 Multiple body parts are only found in invocations of
598 .Sq \&Bl \-column ,
599 where a new body introduces a new phrase.
600 .Pp
601 The
602 .Xr mdoc 7
603 syntax tree accommodates for broken block structures as well.
604 The ENDBODY node is available to end the formatting associated
605 with a given block before the physical end of that block.
606 It has a non-null
607 .Va end
608 field, is of the BODY
609 .Va type ,
610 has the same
611 .Va tok
612 as the BLOCK it is ending, and has a
613 .Va pending
614 field pointing to that BLOCK's BODY node.
615 It is an indirect child of that BODY node
616 and has no children of its own.
617 .Pp
618 An ENDBODY node is generated when a block ends while one of its child
619 blocks is still open, like in the following example:
620 .Bd -literal -offset indent
621 \&.Ao ao
622 \&.Bo bo ac
623 \&.Ac bc
624 \&.Bc end
625 .Ed
626 .Pp
627 This example results in the following block structure:
628 .Bd -literal -offset indent
629 BLOCK Ao
630 HEAD Ao
631 BODY Ao
632 TEXT ao
633 BLOCK Bo, pending -> Ao
634 HEAD Bo
635 BODY Bo
636 TEXT bo
637 TEXT ac
638 ENDBODY Ao, pending -> Ao
639 TEXT bc
640 TEXT end
641 .Ed
642 .Pp
643 Here, the formatting of the
644 .Sq \&Ao
645 block extends from TEXT ao to TEXT ac,
646 while the formatting of the
647 .Sq \&Bo
648 block extends from TEXT bo to TEXT bc.
649 It renders as follows in
650 .Fl T Ns Cm ascii
651 mode:
652 .Pp
653 .Dl <ao [bo ac> bc] end
654 .Pp
655 Support for badly-nested blocks is only provided for backward
656 compatibility with some older
657 .Xr mdoc 7
658 implementations.
659 Using badly-nested blocks is
660 .Em strongly discouraged ;
661 for example, the
662 .Fl T Ns Cm html
663 and
664 .Fl T Ns Cm xhtml
665 front-ends to
666 .Xr mandoc 1
667 are unable to render them in any meaningful way.
668 Furthermore, behaviour when encountering badly-nested blocks is not
669 consistent across troff implementations, especially when using multiple
670 levels of badly-nested blocks.
671 .Sh SEE ALSO
672 .Xr mandoc 1 ,
673 .Xr eqn 7 ,
674 .Xr man 7 ,
675 .Xr mandoc_char 7 ,
676 .Xr mdoc 7 ,
677 .Xr roff 7 ,
678 .Xr tbl 7
679 .Sh AUTHORS
680 The
681 .Nm
682 library was written by
683 .An Kristaps Dzonsons ,
684 .Mt kristaps@bsd.lv .