]> git.cameronkatri.com Git - mandoc.git/blob - mandoc.3
With the improved escape sequence parser, it becomes easy to also improve
[mandoc.git] / mandoc.3
1 .\" $Id: mandoc.3,v 1.44 2018/12/30 00:49:55 schwarze Exp $
2 .\"
3 .\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010-2017 Ingo Schwarze <schwarze@openbsd.org>
5 .\"
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
9 .\"
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17 .\"
18 .Dd $Mdocdate: December 30 2018 $
19 .Dt MANDOC 3
20 .Os
21 .Sh NAME
22 .Nm mandoc ,
23 .Nm deroff ,
24 .Nm mparse_alloc ,
25 .Nm mparse_copy ,
26 .Nm mparse_free ,
27 .Nm mparse_open ,
28 .Nm mparse_readfd ,
29 .Nm mparse_reset ,
30 .Nm mparse_result
31 .Nd mandoc macro compiler library
32 .Sh SYNOPSIS
33 .In sys/types.h
34 .In stdio.h
35 .In mandoc.h
36 .Pp
37 .Fd "#define ASCII_NBRSP"
38 .Fd "#define ASCII_HYPH"
39 .Fd "#define ASCII_BREAK"
40 .Ft struct mparse *
41 .Fo mparse_alloc
42 .Fa "int options"
43 .Fa "enum mandoc_os oe_e"
44 .Fa "char *os_s"
45 .Fc
46 .Ft void
47 .Fo mparse_free
48 .Fa "struct mparse *parse"
49 .Fc
50 .Ft void
51 .Fo mparse_copy
52 .Fa "const struct mparse *parse"
53 .Fc
54 .Ft int
55 .Fo mparse_open
56 .Fa "struct mparse *parse"
57 .Fa "const char *fname"
58 .Fc
59 .Ft void
60 .Fo mparse_readfd
61 .Fa "struct mparse *parse"
62 .Fa "int fd"
63 .Fa "const char *fname"
64 .Fc
65 .Ft void
66 .Fo mparse_reset
67 .Fa "struct mparse *parse"
68 .Fc
69 .Ft struct roff_meta *
70 .Fo mparse_result
71 .Fa "struct mparse *parse"
72 .Fc
73 .In roff.h
74 .Ft void
75 .Fo deroff
76 .Fa "char **dest"
77 .Fa "const struct roff_node *node"
78 .Fc
79 .In sys/types.h
80 .In mandoc.h
81 .In mdoc.h
82 .Vt extern const char * const * mdoc_argnames;
83 .Vt extern const char * const * mdoc_macronames;
84 .In sys/types.h
85 .In mandoc.h
86 .In man.h
87 .Vt extern const char * const * man_macronames;
88 .Sh DESCRIPTION
89 The
90 .Nm mandoc
91 library parses a
92 .Ux
93 manual into an abstract syntax tree (AST).
94 .Ux
95 manuals are composed of
96 .Xr mdoc 7
97 or
98 .Xr man 7 ,
99 and may be mixed with
100 .Xr roff 7 ,
101 .Xr tbl 7 ,
102 and
103 .Xr eqn 7
104 invocations.
105 .Pp
106 The following describes a general parse sequence:
107 .Bl -enum
108 .It
109 initiate a parsing sequence with
110 .Xr mchars_alloc 3
111 and
112 .Fn mparse_alloc ;
113 .It
114 open a file with
115 .Xr open 2
116 or
117 .Fn mparse_open ;
118 .It
119 parse it with
120 .Fn mparse_readfd ;
121 .It
122 close it with
123 .Xr close 2 ;
124 .It
125 retrieve the syntax tree with
126 .Fn mparse_result ;
127 .It
128 if information about the validity of the input is needed, fetch it with
129 .Fn mparse_updaterc ;
130 .It
131 iterate over parse nodes with starting from the
132 .Fa first
133 member of the returned
134 .Vt struct roff_meta ;
135 .It
136 free all allocated memory with
137 .Fn mparse_free
138 and
139 .Xr mchars_free 3 ,
140 or invoke
141 .Fn mparse_reset
142 and go back to step 2 to parse new files.
143 .El
144 .Sh REFERENCE
145 This section documents the functions, types, and variables available
146 via
147 .In mandoc.h ,
148 with the exception of those documented in
149 .Xr mandoc_escape 3
150 and
151 .Xr mchars_alloc 3 .
152 .Ss Types
153 .Bl -ohang
154 .It Vt "enum mandocerr"
155 An error or warning message during parsing.
156 .It Vt "enum mandoclevel"
157 A classification of an
158 .Vt "enum mandocerr"
159 as regards system operation.
160 See the DIAGNOSTICS section in
161 .Xr mandoc 1
162 regarding the meanings of the levels.
163 .It Vt "struct mparse"
164 An opaque pointer to a running parse sequence.
165 Created with
166 .Fn mparse_alloc
167 and freed with
168 .Fn mparse_free .
169 This may be used across parsed input if
170 .Fn mparse_reset
171 is called between parses.
172 .El
173 .Ss Functions
174 .Bl -ohang
175 .It Fn deroff
176 Obtain a text-only representation of a
177 .Vt struct roff_node ,
178 including text contained in its child nodes.
179 To be used on children of the
180 .Fa first
181 member of
182 .Vt struct roff_meta .
183 When it is no longer needed, the pointer returned from
184 .Fn deroff
185 can be passed to
186 .Xr free 3 .
187 .It Fn mparse_alloc
188 Allocate a parser.
189 The arguments have the following effect:
190 .Bl -tag -offset 5n -width inttype
191 .It Ar options
192 When the
193 .Dv MPARSE_MDOC
194 or
195 .Dv MPARSE_MAN
196 bit is set, only that parser is used.
197 Otherwise, the document type is automatically detected.
198 .Pp
199 When the
200 .Dv MPARSE_SO
201 bit is set,
202 .Xr roff 7
203 .Ic \&so
204 file inclusion requests are always honoured.
205 Otherwise, if the request is the only content in an input file,
206 only the file name is remembered, to be returned in the
207 .Fa sodest
208 field of
209 .Vt struct roff_meta .
210 .Pp
211 When the
212 .Dv MPARSE_QUICK
213 bit is set, parsing is aborted after the NAME section.
214 This is for example useful in
215 .Xr makewhatis 8
216 .Fl Q
217 to quickly build minimal databases.
218 .Pp
219 When the
220 .Dv MARSE_VALIDATE
221 bit is set,
222 .Fn mparse_result
223 runs the validation functions before returning the syntax tree.
224 This is almost always required, except in certain debugging scenarios,
225 for example to dump unvalidated syntax trees.
226 .It Ar os_e
227 Operating system to check base system conventions for.
228 If
229 .Dv MANDOC_OS_OTHER ,
230 the system is automatically detected from
231 .Ic \&Os ,
232 .Fl Ios ,
233 or
234 .Xr uname 3 .
235 .It Ar os_s
236 A default string for the
237 .Xr mdoc 7
238 .Ic \&Os
239 macro, overriding the
240 .Dv OSNAME
241 preprocessor definition and the results of
242 .Xr uname 3 .
243 Passing
244 .Dv NULL
245 sets no default.
246 .El
247 .Pp
248 The same parser may be used for multiple files so long as
249 .Fn mparse_reset
250 is called between parses.
251 .Fn mparse_free
252 must be called to free the memory allocated by this function.
253 Declared in
254 .In mandoc.h ,
255 implemented in
256 .Pa read.c .
257 .It Fn mparse_free
258 Free all memory allocated by
259 .Fn mparse_alloc .
260 Declared in
261 .In mandoc.h ,
262 implemented in
263 .Pa read.c .
264 .It Fn mparse_copy
265 Dump a copy of the input to the standard output; used for
266 .Fl man T Ns Cm man .
267 Declared in
268 .In mandoc.h ,
269 implemented in
270 .Pa read.c .
271 .It Fn mparse_open
272 Open the file for reading.
273 If that fails and
274 .Fa fname
275 does not already end in
276 .Ql .gz ,
277 try again after appending
278 .Ql .gz .
279 Save the information whether the file is zipped or not.
280 Return a file descriptor open for reading or -1 on failure.
281 It can be passed to
282 .Fn mparse_readfd
283 or used directly.
284 Declared in
285 .In mandoc.h ,
286 implemented in
287 .Pa read.c .
288 .It Fn mparse_readfd
289 Parse a file descriptor opened with
290 .Xr open 2
291 or
292 .Fn mparse_open .
293 Pass the associated filename in
294 .Va fname .
295 This function may be called multiple times with different parameters; however,
296 .Xr close 2
297 and
298 .Fn mparse_reset
299 should be invoked between parses.
300 Declared in
301 .In mandoc.h ,
302 implemented in
303 .Pa read.c .
304 .It Fn mparse_reset
305 Reset a parser so that
306 .Fn mparse_readfd
307 may be used again.
308 Declared in
309 .In mandoc.h ,
310 implemented in
311 .Pa read.c .
312 .It Fn mparse_result
313 Obtain the result of a parse.
314 Declared in
315 .In mandoc.h ,
316 implemented in
317 .Pa read.c .
318 .El
319 .Ss Variables
320 .Bl -ohang
321 .It Va man_macronames
322 The string representation of a
323 .Xr man 7
324 macro as indexed by
325 .Vt "enum mant" .
326 .It Va mdoc_argnames
327 The string representation of an
328 .Xr mdoc 7
329 macro argument as indexed by
330 .Vt "enum mdocargt" .
331 .It Va mdoc_macronames
332 The string representation of an
333 .Xr mdoc 7
334 macro as indexed by
335 .Vt "enum mdoct" .
336 .El
337 .Sh IMPLEMENTATION NOTES
338 This section consists of structural documentation for
339 .Xr mdoc 7
340 and
341 .Xr man 7
342 syntax trees and strings.
343 .Ss Man and Mdoc Strings
344 Strings may be extracted from mdoc and man meta-data, or from text
345 nodes (MDOC_TEXT and MAN_TEXT, respectively).
346 These strings have special non-printing formatting cues embedded in the
347 text itself, as well as
348 .Xr roff 7
349 escapes preserved from input.
350 Implementing systems will need to handle both situations to produce
351 human-readable text.
352 In general, strings may be assumed to consist of 7-bit ASCII characters.
353 .Pp
354 The following non-printing characters may be embedded in text strings:
355 .Bl -tag -width Ds
356 .It Dv ASCII_NBRSP
357 A non-breaking space character.
358 .It Dv ASCII_HYPH
359 A soft hyphen.
360 .It Dv ASCII_BREAK
361 A breakable zero-width space.
362 .El
363 .Pp
364 Escape characters are also passed verbatim into text strings.
365 An escape character is a sequence of characters beginning with the
366 backslash
367 .Pq Sq \e .
368 To construct human-readable text, these should be intercepted with
369 .Xr mandoc_escape 3
370 and converted with one the functions described in
371 .Xr mchars_alloc 3 .
372 .Ss Man Abstract Syntax Tree
373 This AST is governed by the ontological rules dictated in
374 .Xr man 7
375 and derives its terminology accordingly.
376 .Pp
377 The AST is composed of
378 .Vt struct roff_node
379 nodes with element, root and text types as declared by the
380 .Va type
381 field.
382 Each node also provides its parse point (the
383 .Va line ,
384 .Va pos ,
385 and
386 .Va sec
387 fields), its position in the tree (the
388 .Va parent ,
389 .Va child ,
390 .Va next
391 and
392 .Va prev
393 fields) and some type-specific data.
394 .Pp
395 The tree itself is arranged according to the following normal form,
396 where capitalised non-terminals represent nodes.
397 .Pp
398 .Bl -tag -width "ELEMENTXX" -compact
399 .It ROOT
400 \(<- mnode+
401 .It mnode
402 \(<- ELEMENT | TEXT | BLOCK
403 .It BLOCK
404 \(<- HEAD BODY
405 .It HEAD
406 \(<- mnode*
407 .It BODY
408 \(<- mnode*
409 .It ELEMENT
410 \(<- ELEMENT | TEXT*
411 .It TEXT
412 \(<- [[:ascii:]]*
413 .El
414 .Pp
415 The only elements capable of nesting other elements are those with
416 next-line scope as documented in
417 .Xr man 7 .
418 .Ss Mdoc Abstract Syntax Tree
419 This AST is governed by the ontological
420 rules dictated in
421 .Xr mdoc 7
422 and derives its terminology accordingly.
423 .Qq In-line
424 elements described in
425 .Xr mdoc 7
426 are described simply as
427 .Qq elements .
428 .Pp
429 The AST is composed of
430 .Vt struct roff_node
431 nodes with block, head, body, element, root and text types as declared
432 by the
433 .Va type
434 field.
435 Each node also provides its parse point (the
436 .Va line ,
437 .Va pos ,
438 and
439 .Va sec
440 fields), its position in the tree (the
441 .Va parent ,
442 .Va child ,
443 .Va last ,
444 .Va next
445 and
446 .Va prev
447 fields) and some type-specific data, in particular, for nodes generated
448 from macros, the generating macro in the
449 .Va tok
450 field.
451 .Pp
452 The tree itself is arranged according to the following normal form,
453 where capitalised non-terminals represent nodes.
454 .Pp
455 .Bl -tag -width "ELEMENTXX" -compact
456 .It ROOT
457 \(<- mnode+
458 .It mnode
459 \(<- BLOCK | ELEMENT | TEXT
460 .It BLOCK
461 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
462 .It ELEMENT
463 \(<- TEXT*
464 .It HEAD
465 \(<- mnode*
466 .It BODY
467 \(<- mnode* [ENDBODY mnode*]
468 .It TAIL
469 \(<- mnode*
470 .It TEXT
471 \(<- [[:ascii:]]*
472 .El
473 .Pp
474 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
475 the BLOCK production: these refer to punctuation marks.
476 Furthermore, although a TEXT node will generally have a non-zero-length
477 string, in the specific case of
478 .Sq \&.Bd \-literal ,
479 an empty line will produce a zero-length string.
480 Multiple body parts are only found in invocations of
481 .Sq \&Bl \-column ,
482 where a new body introduces a new phrase.
483 .Pp
484 The
485 .Xr mdoc 7
486 syntax tree accommodates for broken block structures as well.
487 The ENDBODY node is available to end the formatting associated
488 with a given block before the physical end of that block.
489 It has a non-null
490 .Va end
491 field, is of the BODY
492 .Va type ,
493 has the same
494 .Va tok
495 as the BLOCK it is ending, and has a
496 .Va pending
497 field pointing to that BLOCK's BODY node.
498 It is an indirect child of that BODY node
499 and has no children of its own.
500 .Pp
501 An ENDBODY node is generated when a block ends while one of its child
502 blocks is still open, like in the following example:
503 .Bd -literal -offset indent
504 \&.Ao ao
505 \&.Bo bo ac
506 \&.Ac bc
507 \&.Bc end
508 .Ed
509 .Pp
510 This example results in the following block structure:
511 .Bd -literal -offset indent
512 BLOCK Ao
513 HEAD Ao
514 BODY Ao
515 TEXT ao
516 BLOCK Bo, pending -> Ao
517 HEAD Bo
518 BODY Bo
519 TEXT bo
520 TEXT ac
521 ENDBODY Ao, pending -> Ao
522 TEXT bc
523 TEXT end
524 .Ed
525 .Pp
526 Here, the formatting of the
527 .Ic \&Ao
528 block extends from TEXT ao to TEXT ac,
529 while the formatting of the
530 .Ic \&Bo
531 block extends from TEXT bo to TEXT bc.
532 It renders as follows in
533 .Fl T Ns Cm ascii
534 mode:
535 .Pp
536 .Dl <ao [bo ac> bc] end
537 .Pp
538 Support for badly-nested blocks is only provided for backward
539 compatibility with some older
540 .Xr mdoc 7
541 implementations.
542 Using badly-nested blocks is
543 .Em strongly discouraged ;
544 for example, the
545 .Fl T Ns Cm html
546 front-end to
547 .Xr mandoc 1
548 is unable to render them in any meaningful way.
549 Furthermore, behaviour when encountering badly-nested blocks is not
550 consistent across troff implementations, especially when using multiple
551 levels of badly-nested blocks.
552 .Sh SEE ALSO
553 .Xr mandoc 1 ,
554 .Xr man.cgi 3 ,
555 .Xr mandoc_escape 3 ,
556 .Xr mandoc_headers 3 ,
557 .Xr mandoc_malloc 3 ,
558 .Xr mansearch 3 ,
559 .Xr mchars_alloc 3 ,
560 .Xr tbl 3 ,
561 .Xr eqn 7 ,
562 .Xr man 7 ,
563 .Xr mandoc_char 7 ,
564 .Xr mdoc 7 ,
565 .Xr roff 7 ,
566 .Xr tbl 7
567 .Sh AUTHORS
568 .An -nosplit
569 The
570 .Nm
571 library was written by
572 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
573 and is maintained by
574 .An Ingo Schwarze Aq Mt schwarze@openbsd.org .