]> git.cameronkatri.com Git - mandoc.git/blob - mandoc_html.3
Finally get rid of the archaic <table> markup for header and footer lines
[mandoc.git] / mandoc_html.3
1 .\" $Id: mandoc_html.3,v 1.24 2022/06/24 11:15:53 schwarze Exp $
2 .\"
3 .\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
4 .\"
5 .\" Permission to use, copy, modify, and distribute this software for any
6 .\" purpose with or without fee is hereby granted, provided that the above
7 .\" copyright notice and this permission notice appear in all copies.
8 .\"
9 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16 .\"
17 .Dd $Mdocdate: June 24 2022 $
18 .Dt MANDOC_HTML 3
19 .Os
20 .Sh NAME
21 .Nm mandoc_html
22 .Nd internals of the mandoc HTML formatter
23 .Sh SYNOPSIS
24 .In sys/types.h
25 .Fd #include """mandoc.h"""
26 .Fd #include """roff.h"""
27 .Fd #include """out.h"""
28 .Fd #include """html.h"""
29 .Ft void
30 .Fn print_gen_decls "struct html *h"
31 .Ft void
32 .Fn print_gen_comment "struct html *h" "struct roff_node *n"
33 .Ft void
34 .Fn print_gen_head "struct html *h"
35 .Ft struct tag *
36 .Fo print_otag
37 .Fa "struct html *h"
38 .Fa "enum htmltag tag"
39 .Fa "const char *fmt"
40 .Fa ...
41 .Fc
42 .Ft void
43 .Fo print_tagq
44 .Fa "struct html *h"
45 .Fa "const struct tag *until"
46 .Fc
47 .Ft void
48 .Fo print_stagq
49 .Fa "struct html *h"
50 .Fa "const struct tag *suntil"
51 .Fc
52 .Ft void
53 .Fn html_close_paragraph "struct html *h"
54 .Ft enum roff_tok
55 .Fo html_fillmode
56 .Fa "struct html *h"
57 .Fa "enum roff_tok tok"
58 .Fc
59 .Ft int
60 .Fo html_setfont
61 .Fa "struct html *h"
62 .Fa "enum mandoc_esc font"
63 .Fc
64 .Ft void
65 .Fo print_text
66 .Fa "struct html *h"
67 .Fa "const char *word"
68 .Fc
69 .Ft void
70 .Fo print_tagged_text
71 .Fa "struct html *h"
72 .Fa "const char *word"
73 .Fa "struct roff_node *n"
74 .Fc
75 .Ft char *
76 .Fo html_make_id
77 .Fa "const struct roff_node *n"
78 .Fa "int unique"
79 .Fc
80 .Ft struct tag *
81 .Fo print_otag_id
82 .Fa "struct html *h"
83 .Fa "enum htmltag tag"
84 .Fa "const char *cattr"
85 .Fa "struct roff_node *n"
86 .Fc
87 .Ft void
88 .Fn print_endline "struct html *h"
89 .Sh DESCRIPTION
90 The mandoc HTML formatter is not a formal library.
91 However, as it is compiled into more than one program, in particular
92 .Xr mandoc 1
93 and
94 .Xr man.cgi 8 ,
95 and because it may be security-critical in some contexts,
96 some documentation is useful to help to use it correctly and
97 to prevent XSS vulnerabilities.
98 .Pp
99 The formatter produces HTML output on the standard output.
100 Since proper escaping is usually required and best taken care of
101 at one central place, the language-specific formatters
102 .Po
103 .Pa *_html.c ,
104 see
105 .Sx FILES
106 .Pc
107 are not supposed to print directly to
108 .Dv stdout
109 using functions like
110 .Xr printf 3 ,
111 .Xr putc 3 ,
112 .Xr puts 3 ,
113 or
114 .Xr write 2 .
115 Instead, they are expected to use the output functions declared in
116 .Pa html.h
117 and implemented as part of the main HTML formatting engine in
118 .Pa html.c .
119 .Ss Data structures
120 These structures are declared in
121 .Pa html.h .
122 .Bl -tag -width Ds
123 .It Vt struct html
124 Internal state of the HTML formatter.
125 .It Vt struct tag
126 One entry for the LIFO stack of HTML elements.
127 Members include
128 .Fa "enum htmltag tag"
129 and
130 .Fa "struct tag *next" .
131 .El
132 .Ss Private interface functions
133 The function
134 .Fn print_gen_decls
135 prints the opening
136 .Aq Pf \&! Ic DOCTYPE
137 declaration.
138 .Pp
139 The function
140 .Fn print_gen_comment
141 prints the leading comments, usually containing a Copyright notice
142 and license, as an HTML comment.
143 It is intended to be called right after opening the
144 .Aq Ic HTML
145 element.
146 Pass the first
147 .Dv ROFFT_COMMENT
148 node in
149 .Fa n .
150 .Pp
151 The function
152 .Fn print_gen_head
153 prints the opening
154 .Aq Ic META
155 and
156 .Aq Ic LINK
157 elements for the document
158 .Aq Ic HEAD ,
159 using the
160 .Fa style
161 member of
162 .Fa h
163 unless that is
164 .Dv NULL .
165 It uses
166 .Fn print_otag
167 which takes care of properly encoding attributes,
168 which is relevant for the
169 .Fa style
170 link in particular.
171 .Pp
172 The function
173 .Fn print_otag
174 prints the start tag of an HTML element with the name
175 .Fa tag ,
176 optionally including the attributes specified by
177 .Fa fmt .
178 If
179 .Fa fmt
180 is the empty string, no attributes are written.
181 Each letter of
182 .Fa fmt
183 specifies one attribute to write.
184 Most attributes require one
185 .Va char *
186 argument which becomes the value of the attribute.
187 The arguments have to be given in the same order as the attribute letters.
188 If an argument is
189 .Dv NULL ,
190 the respective attribute is not written.
191 .Bl -tag -width 1n -offset indent
192 .It Cm c
193 Print a
194 .Cm class
195 attribute.
196 .It Cm h
197 Print a
198 .Cm href
199 attribute.
200 This attribute letter can optionally be followed by a modifier letter.
201 If followed by
202 .Cm R ,
203 it formats the link as a local one by prefixing a
204 .Sq #
205 character.
206 If followed by
207 .Cm I ,
208 it interpretes the argument as a header file name
209 and generates a link using the
210 .Xr mandoc 1
211 .Fl O Cm includes
212 option.
213 If followed by
214 .Cm M ,
215 it takes two arguments instead of one, a manual page name and
216 section, and formats them as a link to a manual page using the
217 .Xr mandoc 1
218 .Fl O Cm man
219 option.
220 .It Cm i
221 Print an
222 .Cm id
223 attribute.
224 .It Cm r
225 Print an ARIA
226 .Cm role
227 attribute.
228 .It Cm \&?
229 Print an arbitrary attribute.
230 This format letter requires two
231 .Vt char *
232 arguments, the attribute name and the value.
233 The name must not be
234 .Dv NULL .
235 .It Cm s
236 Print a
237 .Cm style
238 attribute.
239 If present, it must be the last format letter.
240 It requires two
241 .Va char *
242 arguments.
243 The first is the name of the style property, the second its value.
244 The name must not be
245 .Dv NULL .
246 The
247 .Cm s
248 .Ar fmt
249 letter can be repeated, each repetition requiring an additional pair of
250 .Va char *
251 arguments.
252 .El
253 .Pp
254 .Fn print_otag
255 uses the private function
256 .Fn print_encode
257 to take care of HTML encoding.
258 If required by the element type, it remembers in
259 .Fa h
260 that the element is open.
261 The function
262 .Fn print_tagq
263 is used to close out all open elements up to and including
264 .Fa until ;
265 .Fn print_stagq
266 is a variant to close out all open elements up to but excluding
267 .Fa suntil .
268 The function
269 .Fn html_close_paragraph
270 closes all open elements that establish phrasing context,
271 thus returning to the innermost flow context.
272 .Pp
273 The function
274 .Fn html_fillmode
275 switches to fill mode if
276 .Fa want
277 is
278 .Dv ROFF_fi
279 or to no-fill mode if
280 .Fa want
281 is
282 .Dv ROFF_nf .
283 Switching from fill mode to no-fill mode closes the current paragraph
284 and opens a
285 .Aq Ic PRE
286 element.
287 Switching in the opposite direction closes the
288 .Aq Ic PRE
289 element, but does not open a new paragraph.
290 If
291 .Fa want
292 matches the mode that is already active, no elements are closed nor opened.
293 If
294 .Fa want
295 is
296 .Dv TOKEN_NONE ,
297 the mode remains as it is.
298 .Pp
299 The function
300 .Fn html_setfont
301 selects the
302 .Fa font ,
303 which can be
304 .Dv ESCAPE_FONTROMAN ,
305 .Dv ESCAPE_FONTBOLD ,
306 .Dv ESCAPE_FONTITALIC ,
307 .Dv ESCAPE_FONTBI ,
308 or
309 .Dv ESCAPE_FONTCW ,
310 for future text output and internally remembers
311 the font that was active before the change.
312 If the
313 .Fa font
314 argument is
315 .Dv ESCAPE_FONTPREV ,
316 the current and the previous font are exchanged.
317 This function only changes the internal state of the
318 .Fa h
319 object; no HTML elements are written yet.
320 Subsequent text output will write font elements when needed.
321 .Pp
322 The function
323 .Fn print_text
324 prints HTML element content.
325 It uses the private function
326 .Fn print_encode
327 to take care of HTML encoding.
328 If the document has requested a non-standard font, for example using a
329 .Xr roff 7
330 .Ic \ef
331 font escape sequence,
332 .Fn print_text
333 wraps
334 .Fa word
335 in an HTML font selection element using the
336 .Fn print_otag
337 and
338 .Fn print_tagq
339 functions.
340 .Pp
341 The function
342 .Fn print_tagged_text
343 is a variant of
344 .Fn print_text
345 that wraps
346 .Fa word
347 in an
348 .Aq Ic A
349 element of class
350 .Qq permalink
351 if
352 .Fa n
353 is not
354 .Dv NULL
355 and yields a segment identifier when passed to
356 .Fn html_make_id .
357 .Pp
358 The function
359 .Fn html_make_id
360 allocates a string to be used for the
361 .Cm id
362 attribute of an HTML element and/or as a segment identifier for a URI in an
363 .Aq Ic A
364 element.
365 If
366 .Fa n
367 contains a
368 .Fa tag
369 attribute, it is used; otherwise, child nodes are used.
370 If
371 .Fa n
372 is an
373 .Ic \&Sh ,
374 .Ic \&Ss ,
375 .Ic \&Sx ,
376 .Ic SH ,
377 or
378 .Ic SS
379 node, the resulting string is the concatenation of the child strings;
380 for other node types, only the first child is used.
381 Bytes not permitted in URI-fragment strings are replaced by underscores.
382 If any of the children to be used is not a text node,
383 no string is generated and
384 .Dv NULL
385 is returned instead.
386 If the
387 .Fa unique
388 argument is non-zero, deduplication is performed by appending an
389 underscore and a decimal integer, if necessary.
390 If the
391 .Fa unique
392 argument is 1, this is assumed to be the first call for this tag
393 at this location, typically for use by
394 .Dv NODE_ID ,
395 so the integer is incremented before use.
396 If the
397 .Fa unique
398 argument is 2, this is ssumed to be the second call for this tag
399 at this location, typically for use by
400 .Dv NODE_HREF ,
401 so the existing integer, if any, is used without incrementing it.
402 .Pp
403 The function
404 .Fn print_otag_id
405 opens a
406 .Fa tag
407 element of class
408 .Fa cattr
409 for the node
410 .Fa n .
411 If the flag
412 .Dv NODE_ID
413 is set in
414 .Fa n ,
415 it attempts to generate an
416 .Cm id
417 attribute with
418 .Fn html_make_id .
419 If the flag
420 .Dv NODE_HREF
421 is set in
422 .Fa n ,
423 an
424 .Aq Ic A
425 element of class
426 .Qq permalink
427 is added:
428 outside if
429 .Fa n
430 generates an element that can only occur in phrasing context,
431 or inside otherwise.
432 This function is a wrapper around
433 .Fn html_make_id
434 and
435 .Fn print_otag ,
436 automatically chosing the
437 .Fa unique
438 argument appropriately and setting the
439 .Fa fmt
440 arguments to
441 .Qq chR
442 and
443 .Qq ci ,
444 respectively.
445 .Pp
446 The function
447 .Fn print_endline
448 makes sure subsequent output starts on a new HTML output line.
449 If nothing was printed on the current output line yet, it has no effect.
450 Otherwise, it appends any buffered text to the current output line,
451 ends the line, and updates the internal state of the
452 .Fa h
453 object.
454 .Pp
455 The functions
456 .Fn print_eqn ,
457 .Fn print_tbl ,
458 and
459 .Fn print_tblclose
460 are not yet documented.
461 .Sh RETURN VALUES
462 The functions
463 .Fn print_otag
464 and
465 .Fn print_otag_id
466 return a pointer to a new element on the stack of HTML elements.
467 When
468 .Fn print_otag_id
469 opens two elements, a pointer to the outer one is returned.
470 The memory pointed to is owned by the library and is automatically
471 .Xr free 3 Ns d
472 when
473 .Fn print_tagq
474 is called on it or when
475 .Fn print_stagq
476 is called on a parent element.
477 .Pp
478 The function
479 .Fn html_fillmode
480 returns
481 .Dv ROFF_fi
482 if fill mode was active before the call or
483 .Dv ROFF_nf
484 otherwise.
485 .Pp
486 The function
487 .Fn html_make_id
488 returns a newly allocated string or
489 .Dv NULL
490 if
491 .Fa n
492 lacks text data to create the attribute from.
493 The caller is responsible for
494 .Xr free 3 Ns ing
495 the returned string after using it.
496 .Pp
497 In case of
498 .Xr malloc 3
499 failure, these functions do not return but call
500 .Xr err 3 .
501 .Sh FILES
502 .Bl -tag -width mandoc_aux.c -compact
503 .It Pa main.h
504 declarations of public functions for use by the main program,
505 not yet documented
506 .It Pa html.h
507 declarations of data types and private functions
508 for use by language-specific HTML formatters
509 .It Pa html.c
510 main HTML formatting engine and utility functions
511 .It Pa mdoc_html.c
512 .Xr mdoc 7
513 HTML formatter
514 .It Pa man_html.c
515 .Xr man 7
516 HTML formatter
517 .It Pa tbl_html.c
518 .Xr tbl 7
519 HTML formatter
520 .It Pa eqn_html.c
521 .Xr eqn 7
522 HTML formatter
523 .It Pa roff_html.c
524 .Xr roff 7
525 HTML formatter, handling requests like
526 .Ic br ,
527 .Ic ce ,
528 .Ic fi ,
529 .Ic ft ,
530 .Ic nf ,
531 .Ic rj ,
532 and
533 .Ic sp .
534 .It Pa out.h
535 declarations of data types and private functions
536 for shared use by all mandoc formatters,
537 not yet documented
538 .It Pa out.c
539 private functions for shared use by all mandoc formatters
540 .It Pa mandoc_aux.h
541 declarations of common mandoc utility functions, see
542 .Xr mandoc 3
543 .It Pa mandoc_aux.c
544 implementation of common mandoc utility functions
545 .El
546 .Sh SEE ALSO
547 .Xr mandoc 1 ,
548 .Xr mandoc 3 ,
549 .Xr man.cgi 8
550 .Sh AUTHORS
551 .An -nosplit
552 The mandoc HTML formatter was written by
553 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv .
554 It is maintained by
555 .An Ingo Schwarze Aq Mt schwarze@openbsd.org ,
556 who also wrote this manual.