]> git.cameronkatri.com Git - mandoc.git/blob - mandoc_html.3
In fragment identifiers, use ~%d for ordinal suffixes,
[mandoc.git] / mandoc_html.3
1 .\" $Id: mandoc_html.3,v 1.22 2020/04/19 15:16:56 schwarze Exp $
2 .\"
3 .\" Copyright (c) 2014, 2017, 2018 Ingo Schwarze <schwarze@openbsd.org>
4 .\"
5 .\" Permission to use, copy, modify, and distribute this software for any
6 .\" purpose with or without fee is hereby granted, provided that the above
7 .\" copyright notice and this permission notice appear in all copies.
8 .\"
9 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
16 .\"
17 .Dd $Mdocdate: April 19 2020 $
18 .Dt MANDOC_HTML 3
19 .Os
20 .Sh NAME
21 .Nm mandoc_html
22 .Nd internals of the mandoc HTML formatter
23 .Sh SYNOPSIS
24 .In sys/types.h
25 .Fd #include """mandoc.h"""
26 .Fd #include """roff.h"""
27 .Fd #include """out.h"""
28 .Fd #include """html.h"""
29 .Ft void
30 .Fn print_gen_decls "struct html *h"
31 .Ft void
32 .Fn print_gen_comment "struct html *h" "struct roff_node *n"
33 .Ft void
34 .Fn print_gen_head "struct html *h"
35 .Ft struct tag *
36 .Fo print_otag
37 .Fa "struct html *h"
38 .Fa "enum htmltag tag"
39 .Fa "const char *fmt"
40 .Fa ...
41 .Fc
42 .Ft void
43 .Fo print_tagq
44 .Fa "struct html *h"
45 .Fa "const struct tag *until"
46 .Fc
47 .Ft void
48 .Fo print_stagq
49 .Fa "struct html *h"
50 .Fa "const struct tag *suntil"
51 .Fc
52 .Ft void
53 .Fn html_close_paragraph "struct html *h"
54 .Ft enum roff_tok
55 .Fo html_fillmode
56 .Fa "struct html *h"
57 .Fa "enum roff_tok tok"
58 .Fc
59 .Ft int
60 .Fo html_setfont
61 .Fa "struct html *h"
62 .Fa "enum mandoc_esc font"
63 .Fc
64 .Ft void
65 .Fo print_text
66 .Fa "struct html *h"
67 .Fa "const char *word"
68 .Fc
69 .Ft void
70 .Fo print_tagged_text
71 .Fa "struct html *h"
72 .Fa "const char *word"
73 .Fa "struct roff_node *n"
74 .Fc
75 .Ft char *
76 .Fo html_make_id
77 .Fa "const struct roff_node *n"
78 .Fa "int unique"
79 .Fc
80 .Ft struct tag *
81 .Fo print_otag_id
82 .Fa "struct html *h"
83 .Fa "enum htmltag tag"
84 .Fa "const char *cattr"
85 .Fa "struct roff_node *n"
86 .Fc
87 .Ft void
88 .Fn print_endline "struct html *h"
89 .Sh DESCRIPTION
90 The mandoc HTML formatter is not a formal library.
91 However, as it is compiled into more than one program, in particular
92 .Xr mandoc 1
93 and
94 .Xr man.cgi 8 ,
95 and because it may be security-critical in some contexts,
96 some documentation is useful to help to use it correctly and
97 to prevent XSS vulnerabilities.
98 .Pp
99 The formatter produces HTML output on the standard output.
100 Since proper escaping is usually required and best taken care of
101 at one central place, the language-specific formatters
102 .Po
103 .Pa *_html.c ,
104 see
105 .Sx FILES
106 .Pc
107 are not supposed to print directly to
108 .Dv stdout
109 using functions like
110 .Xr printf 3 ,
111 .Xr putc 3 ,
112 .Xr puts 3 ,
113 or
114 .Xr write 2 .
115 Instead, they are expected to use the output functions declared in
116 .Pa html.h
117 and implemented as part of the main HTML formatting engine in
118 .Pa html.c .
119 .Ss Data structures
120 These structures are declared in
121 .Pa html.h .
122 .Bl -tag -width Ds
123 .It Vt struct html
124 Internal state of the HTML formatter.
125 .It Vt struct tag
126 One entry for the LIFO stack of HTML elements.
127 Members include
128 .Fa "enum htmltag tag"
129 and
130 .Fa "struct tag *next" .
131 .El
132 .Ss Private interface functions
133 The function
134 .Fn print_gen_decls
135 prints the opening
136 .Aq Pf \&! Ic DOCTYPE
137 declaration.
138 .Pp
139 The function
140 .Fn print_gen_comment
141 prints the leading comments, usually containing a Copyright notice
142 and license, as an HTML comment.
143 It is intended to be called right after opening the
144 .Aq Ic HTML
145 element.
146 Pass the first
147 .Dv ROFFT_COMMENT
148 node in
149 .Fa n .
150 .Pp
151 The function
152 .Fn print_gen_head
153 prints the opening
154 .Aq Ic META
155 and
156 .Aq Ic LINK
157 elements for the document
158 .Aq Ic HEAD ,
159 using the
160 .Fa style
161 member of
162 .Fa h
163 unless that is
164 .Dv NULL .
165 It uses
166 .Fn print_otag
167 which takes care of properly encoding attributes,
168 which is relevant for the
169 .Fa style
170 link in particular.
171 .Pp
172 The function
173 .Fn print_otag
174 prints the start tag of an HTML element with the name
175 .Fa tag ,
176 optionally including the attributes specified by
177 .Fa fmt .
178 If
179 .Fa fmt
180 is the empty string, no attributes are written.
181 Each letter of
182 .Fa fmt
183 specifies one attribute to write.
184 Most attributes require one
185 .Va char *
186 argument which becomes the value of the attribute.
187 The arguments have to be given in the same order as the attribute letters.
188 If an argument is
189 .Dv NULL ,
190 the respective attribute is not written.
191 .Bl -tag -width 1n -offset indent
192 .It Cm c
193 Print a
194 .Cm class
195 attribute.
196 .It Cm h
197 Print a
198 .Cm href
199 attribute.
200 This attribute letter can optionally be followed by a modifier letter.
201 If followed by
202 .Cm R ,
203 it formats the link as a local one by prefixing a
204 .Sq #
205 character.
206 If followed by
207 .Cm I ,
208 it interpretes the argument as a header file name
209 and generates a link using the
210 .Xr mandoc 1
211 .Fl O Cm includes
212 option.
213 If followed by
214 .Cm M ,
215 it takes two arguments instead of one, a manual page name and
216 section, and formats them as a link to a manual page using the
217 .Xr mandoc 1
218 .Fl O Cm man
219 option.
220 .It Cm i
221 Print an
222 .Cm id
223 attribute.
224 .It Cm \&?
225 Print an arbitrary attribute.
226 This format letter requires two
227 .Vt char *
228 arguments, the attribute name and the value.
229 The name must not be
230 .Dv NULL .
231 .El
232 .Pp
233 .Fn print_otag
234 uses the private function
235 .Fn print_encode
236 to take care of HTML encoding.
237 If required by the element type, it remembers in
238 .Fa h
239 that the element is open.
240 The function
241 .Fn print_tagq
242 is used to close out all open elements up to and including
243 .Fa until ;
244 .Fn print_stagq
245 is a variant to close out all open elements up to but excluding
246 .Fa suntil .
247 The function
248 .Fn html_close_paragraph
249 closes all open elements that establish phrasing context,
250 thus returning to the innermost flow context.
251 .Pp
252 The function
253 .Fn html_fillmode
254 switches to fill mode if
255 .Fa want
256 is
257 .Dv ROFF_fi
258 or to no-fill mode if
259 .Fa want
260 is
261 .Dv ROFF_nf .
262 Switching from fill mode to no-fill mode closes the current paragraph
263 and opens a
264 .Aq Ic PRE
265 element.
266 Switching in the opposite direction closes the
267 .Aq Ic PRE
268 element, but does not open a new paragraph.
269 If
270 .Fa want
271 matches the mode that is already active, no elements are closed nor opened.
272 If
273 .Fa want
274 is
275 .Dv TOKEN_NONE ,
276 the mode remains as it is.
277 .Pp
278 The function
279 .Fn html_setfont
280 selects the
281 .Fa font ,
282 which can be
283 .Dv ESCAPE_FONTROMAN ,
284 .Dv ESCAPE_FONTBOLD ,
285 .Dv ESCAPE_FONTITALIC ,
286 .Dv ESCAPE_FONTBI ,
287 or
288 .Dv ESCAPE_FONTCW ,
289 for future text output and internally remembers
290 the font that was active before the change.
291 If the
292 .Fa font
293 argument is
294 .Dv ESCAPE_FONTPREV ,
295 the current and the previous font are exchanged.
296 This function only changes the internal state of the
297 .Fa h
298 object; no HTML elements are written yet.
299 Subsequent text output will write font elements when needed.
300 .Pp
301 The function
302 .Fn print_text
303 prints HTML element content.
304 It uses the private function
305 .Fn print_encode
306 to take care of HTML encoding.
307 If the document has requested a non-standard font, for example using a
308 .Xr roff 7
309 .Ic \ef
310 font escape sequence,
311 .Fn print_text
312 wraps
313 .Fa word
314 in an HTML font selection element using the
315 .Fn print_otag
316 and
317 .Fn print_tagq
318 functions.
319 .Pp
320 The function
321 .Fn print_tagged_text
322 is a variant of
323 .Fn print_text
324 that wraps
325 .Fa word
326 in an
327 .Aq Ic A
328 element of class
329 .Qq permalink
330 if
331 .Fa n
332 is not
333 .Dv NULL
334 and yields a segment identifier when passed to
335 .Fn html_make_id .
336 .Pp
337 The function
338 .Fn html_make_id
339 allocates a string to be used for the
340 .Cm id
341 attribute of an HTML element and/or as a segment identifier for a URI in an
342 .Aq Ic A
343 element.
344 If
345 .Fa n
346 contains a
347 .Fa tag
348 attribute, it is used; otherwise, child nodes are used.
349 If
350 .Fa n
351 is an
352 .Ic \&Sh ,
353 .Ic \&Ss ,
354 .Ic \&Sx ,
355 .Ic SH ,
356 or
357 .Ic SS
358 node, the resulting string is the concatenation of the child strings;
359 for other node types, only the first child is used.
360 Bytes not permitted in URI-fragment strings are replaced by underscores.
361 If any of the children to be used is not a text node,
362 no string is generated and
363 .Dv NULL
364 is returned instead.
365 If the
366 .Fa unique
367 argument is non-zero, deduplication is performed by appending an
368 underscore and a decimal integer, if necessary.
369 If the
370 .Fa unique
371 argument is 1, this is assumed to be the first call for this tag
372 at this location, typically for use by
373 .Dv NODE_ID ,
374 so the integer is incremented before use.
375 If the
376 .Fa unique
377 argument is 2, this is ssumed to be the second call for this tag
378 at this location, typically for use by
379 .Dv NODE_HREF ,
380 so the existing integer, if any, is used without incrementing it.
381 .Pp
382 The function
383 .Fn print_otag_id
384 opens a
385 .Fa tag
386 element of class
387 .Fa cattr
388 for the node
389 .Fa n .
390 If the flag
391 .Dv NODE_ID
392 is set in
393 .Fa n ,
394 it attempts to generate an
395 .Cm id
396 attribute with
397 .Fn html_make_id .
398 If the flag
399 .Dv NODE_HREF
400 is set in
401 .Fa n ,
402 an
403 .Aq Ic A
404 element of class
405 .Qq permalink
406 is added:
407 outside if
408 .Fa n
409 generates an element that can only occur in phrasing context,
410 or inside otherwise.
411 This function is a wrapper around
412 .Fn html_make_id
413 and
414 .Fn print_otag ,
415 automatically chosing the
416 .Fa unique
417 argument appropriately and setting the
418 .Fa fmt
419 arguments to
420 .Qq chR
421 and
422 .Qq ci ,
423 respectively.
424 .Pp
425 The function
426 .Fn print_endline
427 makes sure subsequent output starts on a new HTML output line.
428 If nothing was printed on the current output line yet, it has no effect.
429 Otherwise, it appends any buffered text to the current output line,
430 ends the line, and updates the internal state of the
431 .Fa h
432 object.
433 .Pp
434 The functions
435 .Fn print_eqn ,
436 .Fn print_tbl ,
437 and
438 .Fn print_tblclose
439 are not yet documented.
440 .Sh RETURN VALUES
441 The functions
442 .Fn print_otag
443 and
444 .Fn print_otag_id
445 return a pointer to a new element on the stack of HTML elements.
446 When
447 .Fn print_otag_id
448 opens two elements, a pointer to the outer one is returned.
449 The memory pointed to is owned by the library and is automatically
450 .Xr free 3 Ns d
451 when
452 .Fn print_tagq
453 is called on it or when
454 .Fn print_stagq
455 is called on a parent element.
456 .Pp
457 The function
458 .Fn html_fillmode
459 returns
460 .Dv ROFF_fi
461 if fill mode was active before the call or
462 .Dv ROFF_nf
463 otherwise.
464 .Pp
465 The function
466 .Fn html_make_id
467 returns a newly allocated string or
468 .Dv NULL
469 if
470 .Fa n
471 lacks text data to create the attribute from.
472 The caller is responsible for
473 .Xr free 3 Ns ing
474 the returned string after using it.
475 .Pp
476 In case of
477 .Xr malloc 3
478 failure, these functions do not return but call
479 .Xr err 3 .
480 .Sh FILES
481 .Bl -tag -width mandoc_aux.c -compact
482 .It Pa main.h
483 declarations of public functions for use by the main program,
484 not yet documented
485 .It Pa html.h
486 declarations of data types and private functions
487 for use by language-specific HTML formatters
488 .It Pa html.c
489 main HTML formatting engine and utility functions
490 .It Pa mdoc_html.c
491 .Xr mdoc 7
492 HTML formatter
493 .It Pa man_html.c
494 .Xr man 7
495 HTML formatter
496 .It Pa tbl_html.c
497 .Xr tbl 7
498 HTML formatter
499 .It Pa eqn_html.c
500 .Xr eqn 7
501 HTML formatter
502 .It Pa roff_html.c
503 .Xr roff 7
504 HTML formatter, handling requests like
505 .Ic br ,
506 .Ic ce ,
507 .Ic fi ,
508 .Ic ft ,
509 .Ic nf ,
510 .Ic rj ,
511 and
512 .Ic sp .
513 .It Pa out.h
514 declarations of data types and private functions
515 for shared use by all mandoc formatters,
516 not yet documented
517 .It Pa out.c
518 private functions for shared use by all mandoc formatters
519 .It Pa mandoc_aux.h
520 declarations of common mandoc utility functions, see
521 .Xr mandoc 3
522 .It Pa mandoc_aux.c
523 implementation of common mandoc utility functions
524 .El
525 .Sh SEE ALSO
526 .Xr mandoc 1 ,
527 .Xr mandoc 3 ,
528 .Xr man.cgi 8
529 .Sh AUTHORS
530 .An -nosplit
531 The mandoc HTML formatter was written by
532 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv .
533 It is maintained by
534 .An Ingo Schwarze Aq Mt schwarze@openbsd.org ,
535 who also wrote this manual.