]> git.cameronkatri.com Git - mandoc.git/blob - mdoc.3
95a2adc87596529231005f5803aa4914dc1461ae
[mandoc.git] / mdoc.3
1 .\" $Id: mdoc.3,v 1.47 2010/07/04 22:04:04 schwarze Exp $
2 .\"
3 .\" Copyright (c) 2009, 2010 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010 Ingo Schwarze <schwarze@openbsd.org>
5 .\"
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
9 .\"
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17 .\"
18 .Dd $Mdocdate: July 4 2010 $
19 .Dt MDOC 3
20 .Os
21 .Sh NAME
22 .Nm mdoc ,
23 .Nm mdoc_alloc ,
24 .Nm mdoc_endparse ,
25 .Nm mdoc_free ,
26 .Nm mdoc_meta ,
27 .Nm mdoc_node ,
28 .Nm mdoc_parseln ,
29 .Nm mdoc_reset
30 .Nd mdoc macro compiler library
31 .Sh SYNOPSIS
32 .In mandoc.h
33 .In regs.h
34 .In mdoc.h
35 .Vt extern const char * const * mdoc_macronames;
36 .Vt extern const char * const * mdoc_argnames;
37 .Ft "struct mdoc *"
38 .Fo mdoc_alloc
39 .Fa "struct regset *regs"
40 .Fa "void *data"
41 .Fa "int pflags"
42 .Fa "mandocmsg msgs"
43 .Fc
44 .Ft int
45 .Fn mdoc_endparse "struct mdoc *mdoc"
46 .Ft void
47 .Fn mdoc_free "struct mdoc *mdoc"
48 .Ft "const struct mdoc_meta *"
49 .Fn mdoc_meta "const struct mdoc *mdoc"
50 .Ft "const struct mdoc_node *"
51 .Fn mdoc_node "const struct mdoc *mdoc"
52 .Ft int
53 .Fo mdoc_parseln
54 .Fa "struct mdoc *mdoc"
55 .Fa "int line"
56 .Fa "char *buf"
57 .Fc
58 .Ft int
59 .Fn mdoc_reset "struct mdoc *mdoc"
60 .Sh DESCRIPTION
61 The
62 .Nm mdoc
63 library parses lines of
64 .Xr mdoc 7
65 input
66 into an abstract syntax tree (AST).
67 .Pp
68 In general, applications initiate a parsing sequence with
69 .Fn mdoc_alloc ,
70 parse each line in a document with
71 .Fn mdoc_parseln ,
72 close the parsing session with
73 .Fn mdoc_endparse ,
74 operate over the syntax tree returned by
75 .Fn mdoc_node
76 and
77 .Fn mdoc_meta ,
78 then free all allocated memory with
79 .Fn mdoc_free .
80 The
81 .Fn mdoc_reset
82 function may be used in order to reset the parser for another input
83 sequence.
84 See the
85 .Sx EXAMPLES
86 section for a simple example.
87 .Pp
88 This section further defines the
89 .Sx Types ,
90 .Sx Functions
91 and
92 .Sx Variables
93 available to programmers.
94 Following that, the
95 .Sx Abstract Syntax Tree
96 section documents the output tree.
97 .Ss Types
98 Both functions (see
99 .Sx Functions )
100 and variables (see
101 .Sx Variables )
102 may use the following types:
103 .Bl -ohang
104 .It Vt struct mdoc
105 An opaque type defined in
106 .Pa mdoc.c .
107 Its values are only used privately within the library.
108 .It Vt struct mdoc_node
109 A parsed node.
110 Defined in
111 .Pa mdoc.h .
112 See
113 .Sx Abstract Syntax Tree
114 for details.
115 .It Vt mandocmsg
116 A function callback type defined in
117 .Pa mandoc.h .
118 .El
119 .Ss Functions
120 Function descriptions follow:
121 .Bl -ohang
122 .It Fn mdoc_alloc
123 Allocates a parsing structure.
124 The
125 .Fa data
126 pointer is passed to
127 .Fa msgs .
128 The
129 .Fa pflags
130 arguments are defined in
131 .Pa mdoc.h .
132 Returns NULL on failure.
133 If non-NULL, the pointer must be freed with
134 .Fn mdoc_free .
135 .It Fn mdoc_reset
136 Reset the parser for another parse routine.
137 After its use,
138 .Fn mdoc_parseln
139 behaves as if invoked for the first time.
140 If it returns 0, memory could not be allocated.
141 .It Fn mdoc_free
142 Free all resources of a parser.
143 The pointer is no longer valid after invocation.
144 .It Fn mdoc_parseln
145 Parse a nil-terminated line of input.
146 This line should not contain the trailing newline.
147 Returns 0 on failure, 1 on success.
148 The input buffer
149 .Fa buf
150 is modified by this function.
151 .It Fn mdoc_endparse
152 Signals that the parse is complete.
153 Note that if
154 .Fn mdoc_endparse
155 is called subsequent to
156 .Fn mdoc_node ,
157 the resulting tree is incomplete.
158 Returns 0 on failure, 1 on success.
159 .It Fn mdoc_node
160 Returns the first node of the parse.
161 Note that if
162 .Fn mdoc_parseln
163 or
164 .Fn mdoc_endparse
165 return 0, the tree will be incomplete.
166 .It Fn mdoc_meta
167 Returns the document's parsed meta-data.
168 If this information has not yet been supplied or
169 .Fn mdoc_parseln
170 or
171 .Fn mdoc_endparse
172 return 0, the data will be incomplete.
173 .El
174 .Ss Variables
175 The following variables are also defined:
176 .Bl -ohang
177 .It Va mdoc_macronames
178 An array of string-ified token names.
179 .It Va mdoc_argnames
180 An array of string-ified token argument names.
181 .El
182 .Ss Abstract Syntax Tree
183 The
184 .Nm
185 functions produce an abstract syntax tree (AST) describing input in a
186 regular form.
187 It may be reviewed at any time with
188 .Fn mdoc_nodes ;
189 however, if called before
190 .Fn mdoc_endparse ,
191 or after
192 .Fn mdoc_endparse
193 or
194 .Fn mdoc_parseln
195 fail, it may be incomplete.
196 .Pp
197 This AST is governed by the ontological
198 rules dictated in
199 .Xr mdoc 7
200 and derives its terminology accordingly.
201 .Qq In-line
202 elements described in
203 .Xr mdoc 7
204 are described simply as
205 .Qq elements .
206 .Pp
207 The AST is composed of
208 .Vt struct mdoc_node
209 nodes with block, head, body, element, root and text types as declared
210 by the
211 .Va type
212 field.
213 Each node also provides its parse point (the
214 .Va line ,
215 .Va sec ,
216 and
217 .Va pos
218 fields), its position in the tree (the
219 .Va parent ,
220 .Va child ,
221 .Va nchild ,
222 .Va next
223 and
224 .Va prev
225 fields) and some type-specific data, in particular, for nodes generated
226 from macros, the generating macro in the
227 .Va tok
228 field.
229 .Pp
230 The tree itself is arranged according to the following normal form,
231 where capitalised non-terminals represent nodes.
232 .Pp
233 .Bl -tag -width "ELEMENTXX" -compact
234 .It ROOT
235 \(<- mnode+
236 .It mnode
237 \(<- BLOCK | ELEMENT | TEXT
238 .It BLOCK
239 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
240 .It ELEMENT
241 \(<- TEXT*
242 .It HEAD
243 \(<- mnode*
244 .It BODY
245 \(<- mnode* [ENDBODY mnode*]
246 .It TAIL
247 \(<- mnode*
248 .It TEXT
249 \(<- [[:printable:],0x1e]*
250 .El
251 .Pp
252 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
253 the BLOCK production: these refer to punctuation marks.
254 Furthermore, although a TEXT node will generally have a non-zero-length
255 string, in the specific case of
256 .Sq \&.Bd \-literal ,
257 an empty line will produce a zero-length string.
258 Multiple body parts are only found in invocations of
259 .Sq \&Bl \-column ,
260 where a new body introduces a new phrase.
261 .Ss Badly-nested Blocks
262 The ENDBODY node is available to end the formatting associated
263 with a given block before the physical end of that block.
264 It has a non-null
265 .Va end
266 field, is of the BODY
267 .Va type ,
268 has the same
269 .Va tok
270 as the BLOCK it is ending, and has a
271 .Va pending
272 field pointing to that BLOCK's BODY node.
273 It is an indirect child of that BODY node
274 and has no children of its own.
275 .Pp
276 An ENDBODY node is generated when a block ends while one of its child
277 blocks is still open, like in the following example:
278 .Bd -literal -offset indent
279 \&.Ao ao
280 \&.Bo bo ac
281 \&.Ac bc
282 \&.Bc end
283 .Ed
284 .Pp
285 This example results in the following block structure:
286 .Bd -literal -offset indent
287 BLOCK Ao
288 HEAD Ao
289 BODY Ao
290 TEXT ao
291 BLOCK Bo, pending -> Ao
292 HEAD Bo
293 BODY Bo
294 TEXT bo
295 TEXT ac
296 ENDBODY Ao, pending -> Ao
297 TEXT bc
298 TEXT end
299 .Ed
300 .Pp
301 Here, the formatting of the
302 .Sq \&Ao
303 block extends from TEXT ao to TEXT ac,
304 while the formatting of the
305 .Sq \&Bo
306 block extends from TEXT bo to TEXT bc.
307 It renders as follows in
308 .Fl T Ns Cm ascii
309 mode:
310 .Pp
311 .Dl <ao [bo ac> bc] end
312 .Pp
313 Support for badly-nested blocks is only provided for backward
314 compatibility with some older
315 .Xr mdoc 7
316 implementations.
317 Using badly-nested blocks is
318 .Em strongly discouraged :
319 the
320 .Fl T Ns Cm html
321 and
322 .Fl T Ns Cm xhtml
323 front-ends are unable to render them in any meaningful way.
324 Furthermore, behaviour when encountering badly-nested blocks is not
325 consistent across troff implementations, especially when using multiple
326 levels of badly-nested blocks.
327 .Sh EXAMPLES
328 The following example reads lines from stdin and parses them, operating
329 on the finished parse tree with
330 .Fn parsed .
331 This example does not error-check nor free memory upon failure.
332 .Bd -literal -offset indent
333 struct regset regs;
334 struct mdoc *mdoc;
335 const struct mdoc_node *node;
336 char *buf;
337 size_t len;
338 int line;
339
340 bzero(&regs, sizeof(struct regset));
341 line = 1;
342 mdoc = mdoc_alloc(&regs, NULL, 0, NULL);
343 buf = NULL;
344 alloc_len = 0;
345
346 while ((len = getline(&buf, &alloc_len, stdin)) >= 0) {
347 if (len && buflen[len - 1] = '\en')
348 buf[len - 1] = '\e0';
349 if ( ! mdoc_parseln(mdoc, line, buf))
350 errx(1, "mdoc_parseln");
351 line++;
352 }
353
354 if ( ! mdoc_endparse(mdoc))
355 errx(1, "mdoc_endparse");
356 if (NULL == (node = mdoc_node(mdoc)))
357 errx(1, "mdoc_node");
358
359 parsed(mdoc, node);
360 mdoc_free(mdoc);
361 .Ed
362 .Pp
363 Please see
364 .Pa main.c
365 in the source archive for a rigorous reference.
366 .Sh SEE ALSO
367 .Xr mandoc 1 ,
368 .Xr mdoc 7
369 .Sh AUTHORS
370 The
371 .Nm
372 library was written by
373 .An Kristaps Dzonsons Aq kristaps@bsd.lv .