]> git.cameronkatri.com Git - mandoc.git/blob - mandoc.3
Major cleanup; may imply minor changes in edge cases of error reporting.
[mandoc.git] / mandoc.3
1 .\" $Id: mandoc.3,v 1.43 2018/12/14 01:18:25 schwarze Exp $
2 .\"
3 .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
5 .\"
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
9 .\"
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17 .\"
18 .Dd $Mdocdate: December 14 2018 $
19 .Dt MANDOC 3
20 .Os
21 .Sh NAME
22 .Nm mandoc ,
23 .Nm deroff ,
24 .Nm man_validate ,
25 .Nm mdoc_validate ,
26 .Nm mparse_alloc ,
27 .Nm mparse_copy ,
28 .Nm mparse_free ,
29 .Nm mparse_open ,
30 .Nm mparse_readfd ,
31 .Nm mparse_reset ,
32 .Nm mparse_result
33 .Nd mandoc macro compiler library
34 .Sh SYNOPSIS
35 .In sys/types.h
36 .In stdio.h
37 .In mandoc.h
38 .Pp
39 .Fd "#define ASCII_NBRSP"
40 .Fd "#define ASCII_HYPH"
41 .Fd "#define ASCII_BREAK"
42 .Ft struct mparse *
43 .Fo mparse_alloc
44 .Fa "int options"
45 .Fa "enum mandoc_os oe_e"
46 .Fa "char *os_s"
47 .Fc
48 .Ft void
49 .Fo mparse_free
50 .Fa "struct mparse *parse"
51 .Fc
52 .Ft void
53 .Fo mparse_copy
54 .Fa "const struct mparse *parse"
55 .Fc
56 .Ft int
57 .Fo mparse_open
58 .Fa "struct mparse *parse"
59 .Fa "const char *fname"
60 .Fc
61 .Ft void
62 .Fo mparse_readfd
63 .Fa "struct mparse *parse"
64 .Fa "int fd"
65 .Fa "const char *fname"
66 .Fc
67 .Ft void
68 .Fo mparse_reset
69 .Fa "struct mparse *parse"
70 .Fc
71 .Ft void
72 .Fo mparse_result
73 .Fa "struct mparse *parse"
74 .Fa "struct roff_man **man"
75 .Fa "char **sodest"
76 .Fc
77 .In roff.h
78 .Ft void
79 .Fo deroff
80 .Fa "char **dest"
81 .Fa "const struct roff_node *node"
82 .Fc
83 .In sys/types.h
84 .In mandoc.h
85 .In mdoc.h
86 .Vt extern const char * const * mdoc_argnames;
87 .Vt extern const char * const * mdoc_macronames;
88 .Ft void
89 .Fo mdoc_validate
90 .Fa "struct roff_man *mdoc"
91 .Fc
92 .In sys/types.h
93 .In mandoc.h
94 .In man.h
95 .Vt extern const char * const * man_macronames;
96 .Ft void
97 .Fo man_validate
98 .Fa "struct roff_man *man"
99 .Fc
100 .Sh DESCRIPTION
101 The
102 .Nm mandoc
103 library parses a
104 .Ux
105 manual into an abstract syntax tree (AST).
106 .Ux
107 manuals are composed of
108 .Xr mdoc 7
109 or
110 .Xr man 7 ,
111 and may be mixed with
112 .Xr roff 7 ,
113 .Xr tbl 7 ,
114 and
115 .Xr eqn 7
116 invocations.
117 .Pp
118 The following describes a general parse sequence:
119 .Bl -enum
120 .It
121 initiate a parsing sequence with
122 .Xr mchars_alloc 3
123 and
124 .Fn mparse_alloc ;
125 .It
126 open a file with
127 .Xr open 2
128 or
129 .Fn mparse_open ;
130 .It
131 parse it with
132 .Fn mparse_readfd ;
133 .It
134 close it with
135 .Xr close 2 ;
136 .It
137 retrieve the syntax tree with
138 .Fn mparse_result ;
139 .It
140 depending on whether the
141 .Fa macroset
142 member of the returned
143 .Vt struct roff_man
144 is
145 .Dv MACROSET_MDOC
146 or
147 .Dv MACROSET_MAN ,
148 validate it with
149 .Fn mdoc_validate
150 or
151 .Fn man_validate ,
152 respectively;
153 .It
154 if information about the validity of the input is needed, fetch it with
155 .Fn mparse_updaterc ;
156 .It
157 iterate over parse nodes with starting from the
158 .Fa first
159 member of the returned
160 .Vt struct roff_man ;
161 .It
162 free all allocated memory with
163 .Fn mparse_free
164 and
165 .Xr mchars_free 3 ,
166 or invoke
167 .Fn mparse_reset
168 and go back to step 2 to parse new files.
169 .El
170 .Sh REFERENCE
171 This section documents the functions, types, and variables available
172 via
173 .In mandoc.h ,
174 with the exception of those documented in
175 .Xr mandoc_escape 3
176 and
177 .Xr mchars_alloc 3 .
178 .Ss Types
179 .Bl -ohang
180 .It Vt "enum mandocerr"
181 An error or warning message during parsing.
182 .It Vt "enum mandoclevel"
183 A classification of an
184 .Vt "enum mandocerr"
185 as regards system operation.
186 See the DIAGNOSTICS section in
187 .Xr mandoc 1
188 regarding the meanings of the levels.
189 .It Vt "struct mparse"
190 An opaque pointer to a running parse sequence.
191 Created with
192 .Fn mparse_alloc
193 and freed with
194 .Fn mparse_free .
195 This may be used across parsed input if
196 .Fn mparse_reset
197 is called between parses.
198 .El
199 .Ss Functions
200 .Bl -ohang
201 .It Fn deroff
202 Obtain a text-only representation of a
203 .Vt struct roff_node ,
204 including text contained in its child nodes.
205 To be used on children of the
206 .Fa first
207 member of
208 .Vt struct roff_man .
209 When it is no longer needed, the pointer returned from
210 .Fn deroff
211 can be passed to
212 .Xr free 3 .
213 .It Fn man_validate
214 Validate the
215 .Dv MACROSET_MAN
216 parse tree obtained with
217 .Fn mparse_result .
218 Declared in
219 .In man.h ,
220 implemented in
221 .Pa man.c .
222 .It Fn mdoc_validate
223 Validate the
224 .Dv MACROSET_MDOC
225 parse tree obtained with
226 .Fn mparse_result .
227 Declared in
228 .In mdoc.h ,
229 implemented in
230 .Pa mdoc.c .
231 .It Fn mparse_alloc
232 Allocate a parser.
233 The arguments have the following effect:
234 .Bl -tag -offset 5n -width inttype
235 .It Ar options
236 When the
237 .Dv MPARSE_MDOC
238 or
239 .Dv MPARSE_MAN
240 bit is set, only that parser is used.
241 Otherwise, the document type is automatically detected.
242 .Pp
243 When the
244 .Dv MPARSE_SO
245 bit is set,
246 .Xr roff 7
247 .Ic \&so
248 file inclusion requests are always honoured.
249 Otherwise, if the request is the only content in an input file,
250 only the file name is remembered, to be returned in the
251 .Fa sodest
252 argument of
253 .Fn mparse_result .
254 .Pp
255 When the
256 .Dv MPARSE_QUICK
257 bit is set, parsing is aborted after the NAME section.
258 This is for example useful in
259 .Xr makewhatis 8
260 .Fl Q
261 to quickly build minimal databases.
262 .It Ar os_e
263 Operating system to check base system conventions for.
264 If
265 .Dv MANDOC_OS_OTHER ,
266 the system is automatically detected from
267 .Ic \&Os ,
268 .Fl Ios ,
269 or
270 .Xr uname 3 .
271 .It Ar os_s
272 A default string for the
273 .Xr mdoc 7
274 .Ic \&Os
275 macro, overriding the
276 .Dv OSNAME
277 preprocessor definition and the results of
278 .Xr uname 3 .
279 Passing
280 .Dv NULL
281 sets no default.
282 .El
283 .Pp
284 The same parser may be used for multiple files so long as
285 .Fn mparse_reset
286 is called between parses.
287 .Fn mparse_free
288 must be called to free the memory allocated by this function.
289 Declared in
290 .In mandoc.h ,
291 implemented in
292 .Pa read.c .
293 .It Fn mparse_free
294 Free all memory allocated by
295 .Fn mparse_alloc .
296 Declared in
297 .In mandoc.h ,
298 implemented in
299 .Pa read.c .
300 .It Fn mparse_copy
301 Dump a copy of the input to the standard output; used for
302 .Fl man T Ns Cm man .
303 Declared in
304 .In mandoc.h ,
305 implemented in
306 .Pa read.c .
307 .It Fn mparse_open
308 Open the file for reading.
309 If that fails and
310 .Fa fname
311 does not already end in
312 .Ql .gz ,
313 try again after appending
314 .Ql .gz .
315 Save the information whether the file is zipped or not.
316 Return a file descriptor open for reading or -1 on failure.
317 It can be passed to
318 .Fn mparse_readfd
319 or used directly.
320 Declared in
321 .In mandoc.h ,
322 implemented in
323 .Pa read.c .
324 .It Fn mparse_readfd
325 Parse a file descriptor opened with
326 .Xr open 2
327 or
328 .Fn mparse_open .
329 Pass the associated filename in
330 .Va fname .
331 This function may be called multiple times with different parameters; however,
332 .Xr close 2
333 and
334 .Fn mparse_reset
335 should be invoked between parses.
336 Declared in
337 .In mandoc.h ,
338 implemented in
339 .Pa read.c .
340 .It Fn mparse_reset
341 Reset a parser so that
342 .Fn mparse_readfd
343 may be used again.
344 Declared in
345 .In mandoc.h ,
346 implemented in
347 .Pa read.c .
348 .It Fn mparse_result
349 Obtain the result of a parse.
350 One of the two pointers will be filled in.
351 Declared in
352 .In mandoc.h ,
353 implemented in
354 .Pa read.c .
355 .El
356 .Ss Variables
357 .Bl -ohang
358 .It Va man_macronames
359 The string representation of a
360 .Xr man 7
361 macro as indexed by
362 .Vt "enum mant" .
363 .It Va mdoc_argnames
364 The string representation of an
365 .Xr mdoc 7
366 macro argument as indexed by
367 .Vt "enum mdocargt" .
368 .It Va mdoc_macronames
369 The string representation of an
370 .Xr mdoc 7
371 macro as indexed by
372 .Vt "enum mdoct" .
373 .El
374 .Sh IMPLEMENTATION NOTES
375 This section consists of structural documentation for
376 .Xr mdoc 7
377 and
378 .Xr man 7
379 syntax trees and strings.
380 .Ss Man and Mdoc Strings
381 Strings may be extracted from mdoc and man meta-data, or from text
382 nodes (MDOC_TEXT and MAN_TEXT, respectively).
383 These strings have special non-printing formatting cues embedded in the
384 text itself, as well as
385 .Xr roff 7
386 escapes preserved from input.
387 Implementing systems will need to handle both situations to produce
388 human-readable text.
389 In general, strings may be assumed to consist of 7-bit ASCII characters.
390 .Pp
391 The following non-printing characters may be embedded in text strings:
392 .Bl -tag -width Ds
393 .It Dv ASCII_NBRSP
394 A non-breaking space character.
395 .It Dv ASCII_HYPH
396 A soft hyphen.
397 .It Dv ASCII_BREAK
398 A breakable zero-width space.
399 .El
400 .Pp
401 Escape characters are also passed verbatim into text strings.
402 An escape character is a sequence of characters beginning with the
403 backslash
404 .Pq Sq \e .
405 To construct human-readable text, these should be intercepted with
406 .Xr mandoc_escape 3
407 and converted with one the functions described in
408 .Xr mchars_alloc 3 .
409 .Ss Man Abstract Syntax Tree
410 This AST is governed by the ontological rules dictated in
411 .Xr man 7
412 and derives its terminology accordingly.
413 .Pp
414 The AST is composed of
415 .Vt struct roff_node
416 nodes with element, root and text types as declared by the
417 .Va type
418 field.
419 Each node also provides its parse point (the
420 .Va line ,
421 .Va pos ,
422 and
423 .Va sec
424 fields), its position in the tree (the
425 .Va parent ,
426 .Va child ,
427 .Va next
428 and
429 .Va prev
430 fields) and some type-specific data.
431 .Pp
432 The tree itself is arranged according to the following normal form,
433 where capitalised non-terminals represent nodes.
434 .Pp
435 .Bl -tag -width "ELEMENTXX" -compact
436 .It ROOT
437 \(<- mnode+
438 .It mnode
439 \(<- ELEMENT | TEXT | BLOCK
440 .It BLOCK
441 \(<- HEAD BODY
442 .It HEAD
443 \(<- mnode*
444 .It BODY
445 \(<- mnode*
446 .It ELEMENT
447 \(<- ELEMENT | TEXT*
448 .It TEXT
449 \(<- [[:ascii:]]*
450 .El
451 .Pp
452 The only elements capable of nesting other elements are those with
453 next-line scope as documented in
454 .Xr man 7 .
455 .Ss Mdoc Abstract Syntax Tree
456 This AST is governed by the ontological
457 rules dictated in
458 .Xr mdoc 7
459 and derives its terminology accordingly.
460 .Qq In-line
461 elements described in
462 .Xr mdoc 7
463 are described simply as
464 .Qq elements .
465 .Pp
466 The AST is composed of
467 .Vt struct roff_node
468 nodes with block, head, body, element, root and text types as declared
469 by the
470 .Va type
471 field.
472 Each node also provides its parse point (the
473 .Va line ,
474 .Va pos ,
475 and
476 .Va sec
477 fields), its position in the tree (the
478 .Va parent ,
479 .Va child ,
480 .Va last ,
481 .Va next
482 and
483 .Va prev
484 fields) and some type-specific data, in particular, for nodes generated
485 from macros, the generating macro in the
486 .Va tok
487 field.
488 .Pp
489 The tree itself is arranged according to the following normal form,
490 where capitalised non-terminals represent nodes.
491 .Pp
492 .Bl -tag -width "ELEMENTXX" -compact
493 .It ROOT
494 \(<- mnode+
495 .It mnode
496 \(<- BLOCK | ELEMENT | TEXT
497 .It BLOCK
498 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
499 .It ELEMENT
500 \(<- TEXT*
501 .It HEAD
502 \(<- mnode*
503 .It BODY
504 \(<- mnode* [ENDBODY mnode*]
505 .It TAIL
506 \(<- mnode*
507 .It TEXT
508 \(<- [[:ascii:]]*
509 .El
510 .Pp
511 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
512 the BLOCK production: these refer to punctuation marks.
513 Furthermore, although a TEXT node will generally have a non-zero-length
514 string, in the specific case of
515 .Sq \&.Bd \-literal ,
516 an empty line will produce a zero-length string.
517 Multiple body parts are only found in invocations of
518 .Sq \&Bl \-column ,
519 where a new body introduces a new phrase.
520 .Pp
521 The
522 .Xr mdoc 7
523 syntax tree accommodates for broken block structures as well.
524 The ENDBODY node is available to end the formatting associated
525 with a given block before the physical end of that block.
526 It has a non-null
527 .Va end
528 field, is of the BODY
529 .Va type ,
530 has the same
531 .Va tok
532 as the BLOCK it is ending, and has a
533 .Va pending
534 field pointing to that BLOCK's BODY node.
535 It is an indirect child of that BODY node
536 and has no children of its own.
537 .Pp
538 An ENDBODY node is generated when a block ends while one of its child
539 blocks is still open, like in the following example:
540 .Bd -literal -offset indent
541 \&.Ao ao
542 \&.Bo bo ac
543 \&.Ac bc
544 \&.Bc end
545 .Ed
546 .Pp
547 This example results in the following block structure:
548 .Bd -literal -offset indent
549 BLOCK Ao
550 HEAD Ao
551 BODY Ao
552 TEXT ao
553 BLOCK Bo, pending -> Ao
554 HEAD Bo
555 BODY Bo
556 TEXT bo
557 TEXT ac
558 ENDBODY Ao, pending -> Ao
559 TEXT bc
560 TEXT end
561 .Ed
562 .Pp
563 Here, the formatting of the
564 .Ic \&Ao
565 block extends from TEXT ao to TEXT ac,
566 while the formatting of the
567 .Ic \&Bo
568 block extends from TEXT bo to TEXT bc.
569 It renders as follows in
570 .Fl T Ns Cm ascii
571 mode:
572 .Pp
573 .Dl <ao [bo ac> bc] end
574 .Pp
575 Support for badly-nested blocks is only provided for backward
576 compatibility with some older
577 .Xr mdoc 7
578 implementations.
579 Using badly-nested blocks is
580 .Em strongly discouraged ;
581 for example, the
582 .Fl T Ns Cm html
583 front-end to
584 .Xr mandoc 1
585 is unable to render them in any meaningful way.
586 Furthermore, behaviour when encountering badly-nested blocks is not
587 consistent across troff implementations, especially when using multiple
588 levels of badly-nested blocks.
589 .Sh SEE ALSO
590 .Xr mandoc 1 ,
591 .Xr man.cgi 3 ,
592 .Xr mandoc_escape 3 ,
593 .Xr mandoc_headers 3 ,
594 .Xr mandoc_malloc 3 ,
595 .Xr mansearch 3 ,
596 .Xr mchars_alloc 3 ,
597 .Xr tbl 3 ,
598 .Xr eqn 7 ,
599 .Xr man 7 ,
600 .Xr mandoc_char 7 ,
601 .Xr mdoc 7 ,
602 .Xr roff 7 ,
603 .Xr tbl 7
604 .Sh AUTHORS
605 .An -nosplit
606 The
607 .Nm
608 library was written by
609 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
610 and is maintained by
611 .An Ingo Schwarze Aq Mt schwarze@openbsd.org .