]> git.cameronkatri.com Git - mandoc.git/blob - mandoc_html.3
Do not leak 64 bytes of heap memory every time a manual page calls
[mandoc.git] / mandoc_html.3
1 .\" $Id: mandoc_html.3,v 1.23 2020/04/24 13:13:06 schwarze Exp $
2 .\"
3 .\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
4 .\"
5 .\" Permission to use, copy, modify, and distribute this software for any
6 .\" purpose with or without fee is hereby granted, provided that the above
7 .\" copyright notice and this permission notice appear in all copies.
8 .\"
9 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16 .\"
17 .Dd $Mdocdate: April 24 2020 $
18 .Dt MANDOC_HTML 3
19 .Os
20 .Sh NAME
21 .Nm mandoc_html
22 .Nd internals of the mandoc HTML formatter
23 .Sh SYNOPSIS
24 .In sys/types.h
25 .Fd #include """mandoc.h"""
26 .Fd #include """roff.h"""
27 .Fd #include """out.h"""
28 .Fd #include """html.h"""
29 .Ft void
30 .Fn print_gen_decls "struct html *h"
31 .Ft void
32 .Fn print_gen_comment "struct html *h" "struct roff_node *n"
33 .Ft void
34 .Fn print_gen_head "struct html *h"
35 .Ft struct tag *
36 .Fo print_otag
37 .Fa "struct html *h"
38 .Fa "enum htmltag tag"
39 .Fa "const char *fmt"
40 .Fa ...
41 .Fc
42 .Ft void
43 .Fo print_tagq
44 .Fa "struct html *h"
45 .Fa "const struct tag *until"
46 .Fc
47 .Ft void
48 .Fo print_stagq
49 .Fa "struct html *h"
50 .Fa "const struct tag *suntil"
51 .Fc
52 .Ft void
53 .Fn html_close_paragraph "struct html *h"
54 .Ft enum roff_tok
55 .Fo html_fillmode
56 .Fa "struct html *h"
57 .Fa "enum roff_tok tok"
58 .Fc
59 .Ft int
60 .Fo html_setfont
61 .Fa "struct html *h"
62 .Fa "enum mandoc_esc font"
63 .Fc
64 .Ft void
65 .Fo print_text
66 .Fa "struct html *h"
67 .Fa "const char *word"
68 .Fc
69 .Ft void
70 .Fo print_tagged_text
71 .Fa "struct html *h"
72 .Fa "const char *word"
73 .Fa "struct roff_node *n"
74 .Fc
75 .Ft char *
76 .Fo html_make_id
77 .Fa "const struct roff_node *n"
78 .Fa "int unique"
79 .Fc
80 .Ft struct tag *
81 .Fo print_otag_id
82 .Fa "struct html *h"
83 .Fa "enum htmltag tag"
84 .Fa "const char *cattr"
85 .Fa "struct roff_node *n"
86 .Fc
87 .Ft void
88 .Fn print_endline "struct html *h"
89 .Sh DESCRIPTION
90 The mandoc HTML formatter is not a formal library.
91 However, as it is compiled into more than one program, in particular
92 .Xr mandoc 1
93 and
94 .Xr man.cgi 8 ,
95 and because it may be security-critical in some contexts,
96 some documentation is useful to help to use it correctly and
97 to prevent XSS vulnerabilities.
98 .Pp
99 The formatter produces HTML output on the standard output.
100 Since proper escaping is usually required and best taken care of
101 at one central place, the language-specific formatters
102 .Po
103 .Pa *_html.c ,
104 see
105 .Sx FILES
106 .Pc
107 are not supposed to print directly to
108 .Dv stdout
109 using functions like
110 .Xr printf 3 ,
111 .Xr putc 3 ,
112 .Xr puts 3 ,
113 or
114 .Xr write 2 .
115 Instead, they are expected to use the output functions declared in
116 .Pa html.h
117 and implemented as part of the main HTML formatting engine in
118 .Pa html.c .
119 .Ss Data structures
120 These structures are declared in
121 .Pa html.h .
122 .Bl -tag -width Ds
123 .It Vt struct html
124 Internal state of the HTML formatter.
125 .It Vt struct tag
126 One entry for the LIFO stack of HTML elements.
127 Members include
128 .Fa "enum htmltag tag"
129 and
130 .Fa "struct tag *next" .
131 .El
132 .Ss Private interface functions
133 The function
134 .Fn print_gen_decls
135 prints the opening
136 .Aq Pf \&! Ic DOCTYPE
137 declaration.
138 .Pp
139 The function
140 .Fn print_gen_comment
141 prints the leading comments, usually containing a Copyright notice
142 and license, as an HTML comment.
143 It is intended to be called right after opening the
144 .Aq Ic HTML
145 element.
146 Pass the first
147 .Dv ROFFT_COMMENT
148 node in
149 .Fa n .
150 .Pp
151 The function
152 .Fn print_gen_head
153 prints the opening
154 .Aq Ic META
155 and
156 .Aq Ic LINK
157 elements for the document
158 .Aq Ic HEAD ,
159 using the
160 .Fa style
161 member of
162 .Fa h
163 unless that is
164 .Dv NULL .
165 It uses
166 .Fn print_otag
167 which takes care of properly encoding attributes,
168 which is relevant for the
169 .Fa style
170 link in particular.
171 .Pp
172 The function
173 .Fn print_otag
174 prints the start tag of an HTML element with the name
175 .Fa tag ,
176 optionally including the attributes specified by
177 .Fa fmt .
178 If
179 .Fa fmt
180 is the empty string, no attributes are written.
181 Each letter of
182 .Fa fmt
183 specifies one attribute to write.
184 Most attributes require one
185 .Va char *
186 argument which becomes the value of the attribute.
187 The arguments have to be given in the same order as the attribute letters.
188 If an argument is
189 .Dv NULL ,
190 the respective attribute is not written.
191 .Bl -tag -width 1n -offset indent
192 .It Cm c
193 Print a
194 .Cm class
195 attribute.
196 .It Cm h
197 Print a
198 .Cm href
199 attribute.
200 This attribute letter can optionally be followed by a modifier letter.
201 If followed by
202 .Cm R ,
203 it formats the link as a local one by prefixing a
204 .Sq #
205 character.
206 If followed by
207 .Cm I ,
208 it interpretes the argument as a header file name
209 and generates a link using the
210 .Xr mandoc 1
211 .Fl O Cm includes
212 option.
213 If followed by
214 .Cm M ,
215 it takes two arguments instead of one, a manual page name and
216 section, and formats them as a link to a manual page using the
217 .Xr mandoc 1
218 .Fl O Cm man
219 option.
220 .It Cm i
221 Print an
222 .Cm id
223 attribute.
224 .It Cm \&?
225 Print an arbitrary attribute.
226 This format letter requires two
227 .Vt char *
228 arguments, the attribute name and the value.
229 The name must not be
230 .Dv NULL .
231 .It Cm s
232 Print a
233 .Cm style
234 attribute.
235 If present, it must be the last format letter.
236 It requires two
237 .Va char *
238 arguments.
239 The first is the name of the style property, the second its value.
240 The name must not be
241 .Dv NULL .
242 The
243 .Cm s
244 .Ar fmt
245 letter can be repeated, each repetition requiring an additional pair of
246 .Va char *
247 arguments.
248 .El
249 .Pp
250 .Fn print_otag
251 uses the private function
252 .Fn print_encode
253 to take care of HTML encoding.
254 If required by the element type, it remembers in
255 .Fa h
256 that the element is open.
257 The function
258 .Fn print_tagq
259 is used to close out all open elements up to and including
260 .Fa until ;
261 .Fn print_stagq
262 is a variant to close out all open elements up to but excluding
263 .Fa suntil .
264 The function
265 .Fn html_close_paragraph
266 closes all open elements that establish phrasing context,
267 thus returning to the innermost flow context.
268 .Pp
269 The function
270 .Fn html_fillmode
271 switches to fill mode if
272 .Fa want
273 is
274 .Dv ROFF_fi
275 or to no-fill mode if
276 .Fa want
277 is
278 .Dv ROFF_nf .
279 Switching from fill mode to no-fill mode closes the current paragraph
280 and opens a
281 .Aq Ic PRE
282 element.
283 Switching in the opposite direction closes the
284 .Aq Ic PRE
285 element, but does not open a new paragraph.
286 If
287 .Fa want
288 matches the mode that is already active, no elements are closed nor opened.
289 If
290 .Fa want
291 is
292 .Dv TOKEN_NONE ,
293 the mode remains as it is.
294 .Pp
295 The function
296 .Fn html_setfont
297 selects the
298 .Fa font ,
299 which can be
300 .Dv ESCAPE_FONTROMAN ,
301 .Dv ESCAPE_FONTBOLD ,
302 .Dv ESCAPE_FONTITALIC ,
303 .Dv ESCAPE_FONTBI ,
304 or
305 .Dv ESCAPE_FONTCW ,
306 for future text output and internally remembers
307 the font that was active before the change.
308 If the
309 .Fa font
310 argument is
311 .Dv ESCAPE_FONTPREV ,
312 the current and the previous font are exchanged.
313 This function only changes the internal state of the
314 .Fa h
315 object; no HTML elements are written yet.
316 Subsequent text output will write font elements when needed.
317 .Pp
318 The function
319 .Fn print_text
320 prints HTML element content.
321 It uses the private function
322 .Fn print_encode
323 to take care of HTML encoding.
324 If the document has requested a non-standard font, for example using a
325 .Xr roff 7
326 .Ic \ef
327 font escape sequence,
328 .Fn print_text
329 wraps
330 .Fa word
331 in an HTML font selection element using the
332 .Fn print_otag
333 and
334 .Fn print_tagq
335 functions.
336 .Pp
337 The function
338 .Fn print_tagged_text
339 is a variant of
340 .Fn print_text
341 that wraps
342 .Fa word
343 in an
344 .Aq Ic A
345 element of class
346 .Qq permalink
347 if
348 .Fa n
349 is not
350 .Dv NULL
351 and yields a segment identifier when passed to
352 .Fn html_make_id .
353 .Pp
354 The function
355 .Fn html_make_id
356 allocates a string to be used for the
357 .Cm id
358 attribute of an HTML element and/or as a segment identifier for a URI in an
359 .Aq Ic A
360 element.
361 If
362 .Fa n
363 contains a
364 .Fa tag
365 attribute, it is used; otherwise, child nodes are used.
366 If
367 .Fa n
368 is an
369 .Ic \&Sh ,
370 .Ic \&Ss ,
371 .Ic \&Sx ,
372 .Ic SH ,
373 or
374 .Ic SS
375 node, the resulting string is the concatenation of the child strings;
376 for other node types, only the first child is used.
377 Bytes not permitted in URI-fragment strings are replaced by underscores.
378 If any of the children to be used is not a text node,
379 no string is generated and
380 .Dv NULL
381 is returned instead.
382 If the
383 .Fa unique
384 argument is non-zero, deduplication is performed by appending an
385 underscore and a decimal integer, if necessary.
386 If the
387 .Fa unique
388 argument is 1, this is assumed to be the first call for this tag
389 at this location, typically for use by
390 .Dv NODE_ID ,
391 so the integer is incremented before use.
392 If the
393 .Fa unique
394 argument is 2, this is ssumed to be the second call for this tag
395 at this location, typically for use by
396 .Dv NODE_HREF ,
397 so the existing integer, if any, is used without incrementing it.
398 .Pp
399 The function
400 .Fn print_otag_id
401 opens a
402 .Fa tag
403 element of class
404 .Fa cattr
405 for the node
406 .Fa n .
407 If the flag
408 .Dv NODE_ID
409 is set in
410 .Fa n ,
411 it attempts to generate an
412 .Cm id
413 attribute with
414 .Fn html_make_id .
415 If the flag
416 .Dv NODE_HREF
417 is set in
418 .Fa n ,
419 an
420 .Aq Ic A
421 element of class
422 .Qq permalink
423 is added:
424 outside if
425 .Fa n
426 generates an element that can only occur in phrasing context,
427 or inside otherwise.
428 This function is a wrapper around
429 .Fn html_make_id
430 and
431 .Fn print_otag ,
432 automatically chosing the
433 .Fa unique
434 argument appropriately and setting the
435 .Fa fmt
436 arguments to
437 .Qq chR
438 and
439 .Qq ci ,
440 respectively.
441 .Pp
442 The function
443 .Fn print_endline
444 makes sure subsequent output starts on a new HTML output line.
445 If nothing was printed on the current output line yet, it has no effect.
446 Otherwise, it appends any buffered text to the current output line,
447 ends the line, and updates the internal state of the
448 .Fa h
449 object.
450 .Pp
451 The functions
452 .Fn print_eqn ,
453 .Fn print_tbl ,
454 and
455 .Fn print_tblclose
456 are not yet documented.
457 .Sh RETURN VALUES
458 The functions
459 .Fn print_otag
460 and
461 .Fn print_otag_id
462 return a pointer to a new element on the stack of HTML elements.
463 When
464 .Fn print_otag_id
465 opens two elements, a pointer to the outer one is returned.
466 The memory pointed to is owned by the library and is automatically
467 .Xr free 3 Ns d
468 when
469 .Fn print_tagq
470 is called on it or when
471 .Fn print_stagq
472 is called on a parent element.
473 .Pp
474 The function
475 .Fn html_fillmode
476 returns
477 .Dv ROFF_fi
478 if fill mode was active before the call or
479 .Dv ROFF_nf
480 otherwise.
481 .Pp
482 The function
483 .Fn html_make_id
484 returns a newly allocated string or
485 .Dv NULL
486 if
487 .Fa n
488 lacks text data to create the attribute from.
489 The caller is responsible for
490 .Xr free 3 Ns ing
491 the returned string after using it.
492 .Pp
493 In case of
494 .Xr malloc 3
495 failure, these functions do not return but call
496 .Xr err 3 .
497 .Sh FILES
498 .Bl -tag -width mandoc_aux.c -compact
499 .It Pa main.h
500 declarations of public functions for use by the main program,
501 not yet documented
502 .It Pa html.h
503 declarations of data types and private functions
504 for use by language-specific HTML formatters
505 .It Pa html.c
506 main HTML formatting engine and utility functions
507 .It Pa mdoc_html.c
508 .Xr mdoc 7
509 HTML formatter
510 .It Pa man_html.c
511 .Xr man 7
512 HTML formatter
513 .It Pa tbl_html.c
514 .Xr tbl 7
515 HTML formatter
516 .It Pa eqn_html.c
517 .Xr eqn 7
518 HTML formatter
519 .It Pa roff_html.c
520 .Xr roff 7
521 HTML formatter, handling requests like
522 .Ic br ,
523 .Ic ce ,
524 .Ic fi ,
525 .Ic ft ,
526 .Ic nf ,
527 .Ic rj ,
528 and
529 .Ic sp .
530 .It Pa out.h
531 declarations of data types and private functions
532 for shared use by all mandoc formatters,
533 not yet documented
534 .It Pa out.c
535 private functions for shared use by all mandoc formatters
536 .It Pa mandoc_aux.h
537 declarations of common mandoc utility functions, see
538 .Xr mandoc 3
539 .It Pa mandoc_aux.c
540 implementation of common mandoc utility functions
541 .El
542 .Sh SEE ALSO
543 .Xr mandoc 1 ,
544 .Xr mandoc 3 ,
545 .Xr man.cgi 8
546 .Sh AUTHORS
547 .An -nosplit
548 The mandoc HTML formatter was written by
549 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv .
550 It is maintained by
551 .An Ingo Schwarze Aq Mt schwarze@openbsd.org ,
552 who also wrote this manual.