]> git.cameronkatri.com Git - mandoc.git/blob - mdoc.3
Implement the \N'number' (numbered character) roff escape sequence.
[mandoc.git] / mdoc.3
1 .\" $Id: mdoc.3,v 1.55 2011/01/07 15:07:21 kristaps Exp $
2 .\"
3 .\" Copyright (c) 2009, 2010 Kristaps Dzonsons <kristaps@bsd.lv>
4 .\" Copyright (c) 2010 Ingo Schwarze <schwarze@openbsd.org>
5 .\"
6 .\" Permission to use, copy, modify, and distribute this software for any
7 .\" purpose with or without fee is hereby granted, provided that the above
8 .\" copyright notice and this permission notice appear in all copies.
9 .\"
10 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
11 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
12 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
13 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
14 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
15 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
16 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17 .\"
18 .Dd $Mdocdate: January 7 2011 $
19 .Dt MDOC 3
20 .Os
21 .Sh NAME
22 .Nm mdoc ,
23 .Nm mdoc_alloc ,
24 .Nm mdoc_endparse ,
25 .Nm mdoc_free ,
26 .Nm mdoc_meta ,
27 .Nm mdoc_node ,
28 .Nm mdoc_parseln ,
29 .Nm mdoc_reset
30 .Nd mdoc macro compiler library
31 .Sh SYNOPSIS
32 .In mandoc.h
33 .In mdoc.h
34 .Vt extern const char * const * mdoc_macronames;
35 .Vt extern const char * const * mdoc_argnames;
36 .Ft int
37 .Fo mdoc_addspan
38 .Fa "struct mdoc *mdoc"
39 .Fa "const struct tbl_span *span"
40 .Fc
41 .Ft "struct mdoc *"
42 .Fo mdoc_alloc
43 .Fa "struct regset *regs"
44 .Fa "void *data"
45 .Fa "mandocmsg msgs"
46 .Fc
47 .Ft int
48 .Fn mdoc_endparse "struct mdoc *mdoc"
49 .Ft void
50 .Fn mdoc_free "struct mdoc *mdoc"
51 .Ft "const struct mdoc_meta *"
52 .Fn mdoc_meta "const struct mdoc *mdoc"
53 .Ft "const struct mdoc_node *"
54 .Fn mdoc_node "const struct mdoc *mdoc"
55 .Ft int
56 .Fo mdoc_parseln
57 .Fa "struct mdoc *mdoc"
58 .Fa "int line"
59 .Fa "char *buf"
60 .Fc
61 .Ft int
62 .Fn mdoc_reset "struct mdoc *mdoc"
63 .Sh DESCRIPTION
64 The
65 .Nm mdoc
66 library parses lines of
67 .Xr mdoc 7
68 input
69 into an abstract syntax tree (AST).
70 .Pp
71 In general, applications initiate a parsing sequence with
72 .Fn mdoc_alloc ,
73 parse each line in a document with
74 .Fn mdoc_parseln ,
75 close the parsing session with
76 .Fn mdoc_endparse ,
77 operate over the syntax tree returned by
78 .Fn mdoc_node
79 and
80 .Fn mdoc_meta ,
81 then free all allocated memory with
82 .Fn mdoc_free .
83 The
84 .Fn mdoc_reset
85 function may be used in order to reset the parser for another input
86 sequence.
87 .Ss Types
88 .Bl -ohang
89 .It Vt struct mdoc
90 An opaque type.
91 Its values are only used privately within the library.
92 .It Vt struct mdoc_node
93 A parsed node.
94 See
95 .Sx Abstract Syntax Tree
96 for details.
97 .El
98 .Ss Functions
99 If
100 .Fn mdoc_addspan ,
101 .Fn mdoc_parseln ,
102 or
103 .Fn mdoc_endparse
104 return 0, calls to any function but
105 .Fn mdoc_reset
106 or
107 .Fn mdoc_free
108 will raise an assertion.
109 .Bl -ohang
110 .It Fn mdoc_addspan
111 Add a table span to the parsing stream.
112 Returns 0 on failure, 1 on success.
113 .It Fn mdoc_alloc
114 Allocates a parsing structure.
115 The
116 .Fa data
117 pointer is passed to
118 .Fa msgs .
119 Always returns a valid pointer.
120 The pointer must be freed with
121 .Fn mdoc_free .
122 .It Fn mdoc_reset
123 Reset the parser for another parse routine.
124 After its use,
125 .Fn mdoc_parseln
126 behaves as if invoked for the first time.
127 If it returns 0, memory could not be allocated.
128 .It Fn mdoc_free
129 Free all resources of a parser.
130 The pointer is no longer valid after invocation.
131 .It Fn mdoc_parseln
132 Parse a nil-terminated line of input.
133 This line should not contain the trailing newline.
134 Returns 0 on failure, 1 on success.
135 The input buffer
136 .Fa buf
137 is modified by this function.
138 .It Fn mdoc_endparse
139 Signals that the parse is complete.
140 Returns 0 on failure, 1 on success.
141 .It Fn mdoc_node
142 Returns the first node of the parse.
143 .It Fn mdoc_meta
144 Returns the document's parsed meta-data.
145 .El
146 .Ss Variables
147 .Bl -ohang
148 .It Va mdoc_macronames
149 An array of string-ified token names.
150 .It Va mdoc_argnames
151 An array of string-ified token argument names.
152 .El
153 .Ss Abstract Syntax Tree
154 The
155 .Nm
156 functions produce an abstract syntax tree (AST) describing input in a
157 regular form.
158 It may be reviewed at any time with
159 .Fn mdoc_nodes ;
160 however, if called before
161 .Fn mdoc_endparse ,
162 or after
163 .Fn mdoc_endparse
164 or
165 .Fn mdoc_parseln
166 fail, it may be incomplete.
167 .Pp
168 This AST is governed by the ontological
169 rules dictated in
170 .Xr mdoc 7
171 and derives its terminology accordingly.
172 .Qq In-line
173 elements described in
174 .Xr mdoc 7
175 are described simply as
176 .Qq elements .
177 .Pp
178 The AST is composed of
179 .Vt struct mdoc_node
180 nodes with block, head, body, element, root and text types as declared
181 by the
182 .Va type
183 field.
184 Each node also provides its parse point (the
185 .Va line ,
186 .Va sec ,
187 and
188 .Va pos
189 fields), its position in the tree (the
190 .Va parent ,
191 .Va child ,
192 .Va nchild ,
193 .Va next
194 and
195 .Va prev
196 fields) and some type-specific data, in particular, for nodes generated
197 from macros, the generating macro in the
198 .Va tok
199 field.
200 .Pp
201 The tree itself is arranged according to the following normal form,
202 where capitalised non-terminals represent nodes.
203 .Pp
204 .Bl -tag -width "ELEMENTXX" -compact
205 .It ROOT
206 \(<- mnode+
207 .It mnode
208 \(<- BLOCK | ELEMENT | TEXT
209 .It BLOCK
210 \(<- HEAD [TEXT] (BODY [TEXT])+ [TAIL [TEXT]]
211 .It ELEMENT
212 \(<- TEXT*
213 .It HEAD
214 \(<- mnode*
215 .It BODY
216 \(<- mnode* [ENDBODY mnode*]
217 .It TAIL
218 \(<- mnode*
219 .It TEXT
220 \(<- [[:printable:],0x1e]*
221 .El
222 .Pp
223 Of note are the TEXT nodes following the HEAD, BODY and TAIL nodes of
224 the BLOCK production: these refer to punctuation marks.
225 Furthermore, although a TEXT node will generally have a non-zero-length
226 string, in the specific case of
227 .Sq \&.Bd \-literal ,
228 an empty line will produce a zero-length string.
229 Multiple body parts are only found in invocations of
230 .Sq \&Bl \-column ,
231 where a new body introduces a new phrase.
232 .Ss Badly-nested Blocks
233 The ENDBODY node is available to end the formatting associated
234 with a given block before the physical end of that block.
235 It has a non-null
236 .Va end
237 field, is of the BODY
238 .Va type ,
239 has the same
240 .Va tok
241 as the BLOCK it is ending, and has a
242 .Va pending
243 field pointing to that BLOCK's BODY node.
244 It is an indirect child of that BODY node
245 and has no children of its own.
246 .Pp
247 An ENDBODY node is generated when a block ends while one of its child
248 blocks is still open, like in the following example:
249 .Bd -literal -offset indent
250 \&.Ao ao
251 \&.Bo bo ac
252 \&.Ac bc
253 \&.Bc end
254 .Ed
255 .Pp
256 This example results in the following block structure:
257 .Bd -literal -offset indent
258 BLOCK Ao
259 HEAD Ao
260 BODY Ao
261 TEXT ao
262 BLOCK Bo, pending -> Ao
263 HEAD Bo
264 BODY Bo
265 TEXT bo
266 TEXT ac
267 ENDBODY Ao, pending -> Ao
268 TEXT bc
269 TEXT end
270 .Ed
271 .Pp
272 Here, the formatting of the
273 .Sq \&Ao
274 block extends from TEXT ao to TEXT ac,
275 while the formatting of the
276 .Sq \&Bo
277 block extends from TEXT bo to TEXT bc.
278 It renders as follows in
279 .Fl T Ns Cm ascii
280 mode:
281 .Pp
282 .Dl <ao [bo ac> bc] end
283 .Pp
284 Support for badly-nested blocks is only provided for backward
285 compatibility with some older
286 .Xr mdoc 7
287 implementations.
288 Using badly-nested blocks is
289 .Em strongly discouraged :
290 the
291 .Fl T Ns Cm html
292 and
293 .Fl T Ns Cm xhtml
294 front-ends are unable to render them in any meaningful way.
295 Furthermore, behaviour when encountering badly-nested blocks is not
296 consistent across troff implementations, especially when using multiple
297 levels of badly-nested blocks.
298 .Sh EXAMPLES
299 The following example reads lines from stdin and parses them, operating
300 on the finished parse tree with
301 .Fn parsed .
302 This example does not error-check nor free memory upon failure.
303 .Bd -literal -offset indent
304 struct regset regs;
305 struct mdoc *mdoc;
306 const struct mdoc_node *node;
307 char *buf;
308 size_t len;
309 int line;
310
311 bzero(&regs, sizeof(struct regset));
312 line = 1;
313 mdoc = mdoc_alloc(&regs, NULL, NULL);
314 buf = NULL;
315 alloc_len = 0;
316
317 while ((len = getline(&buf, &alloc_len, stdin)) >= 0) {
318 if (len && buflen[len - 1] = '\en')
319 buf[len - 1] = '\e0';
320 if ( ! mdoc_parseln(mdoc, line, buf))
321 errx(1, "mdoc_parseln");
322 line++;
323 }
324
325 if ( ! mdoc_endparse(mdoc))
326 errx(1, "mdoc_endparse");
327 if (NULL == (node = mdoc_node(mdoc)))
328 errx(1, "mdoc_node");
329
330 parsed(mdoc, node);
331 mdoc_free(mdoc);
332 .Ed
333 .Pp
334 To compile this, execute
335 .Pp
336 .Dl % cc main.c libmdoc.a libmandoc.a
337 .Pp
338 where
339 .Pa main.c
340 is the example file.
341 .Sh SEE ALSO
342 .Xr mandoc 1 ,
343 .Xr mdoc 7
344 .Sh AUTHORS
345 The
346 .Nm
347 library was written by
348 .An Kristaps Dzonsons Aq kristaps@bsd.lv .