1 .\" $Id: mandoc_escape.3,v 1.4 2017/07/04 23:40:01 schwarze Exp $
3 .\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
5 .\" Permission to use, copy, modify, and distribute this software for any
6 .\" purpose with or without fee is hereby granted, provided that the above
7 .\" copyright notice and this permission notice appear in all copies.
9 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17 .Dd $Mdocdate: July 4 2017 $
22 .Nd parse roff escape sequences
28 .Fa "const char **end"
29 .Fa "const char **start"
37 An escape sequence consists of
38 .Bl -dash -compact -width 2n
40 an initial backslash character
43 a single ASCII character called the escape sequence identifier,
45 and, with only a few exceptions, an argument.
48 Arguments can be given in the following forms; some escape sequence
49 identifiers only accept some of these forms as specified below.
50 The first three forms are called the standard forms.
52 .It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&]
53 The argument starts after the initial
57 and the escape sequence ends with the final
59 .It Two-character argument short form: Ic \&( Ns Ar ar
60 This form can only be used for arguments
61 consisting of exactly two characters.
62 It has the same effect as
63 .Ic \&[ Ns Ar ar Ns Ic \&] .
64 .It One-character argument short form: Ar a
65 This form can only be used for arguments
66 consisting of exactly one character.
67 It has the same effect as
68 .Ic \&[ Ns Ar a Ns Ic \&] .
69 .It Delimited form: Ar C Ns Ar argument Ns Ar C
70 The argument starts after the initial delimiter character
72 ends before the next occurrence of the delimiter character
74 and the escape sequence ends with that second
76 Some escape sequences allow arbitrary characters
78 as quoting characters, some restrict the range of characters
79 that can be used as quoting characters.
84 is expected to point to the escape sequence identifier.
85 The values passed in as
89 are ignored and overwritten.
91 By design, this function cannot handle those
93 escape sequences that require in-place expansion, in particular
100 and numerical expression control
104 a private preprocessor function called from
112 .Bl -dash -compact -width 2n
114 recursively by itself, because some escape sequence arguments can
115 in turn contain other escape sequences,
117 for error detection internally by the
121 library, see the file
124 above all externally by the
126 formatting modules, in particular
130 for formatting purposes, see the files
135 and rarely externally by high-level utilities using the mandoc library,
138 to purge escape sequences from text.
141 Upon function return, the pointer
143 is set to the character after the end of the escape sequence,
144 such that the calling higher-level parser can easily continue.
146 For escape sequences taking an argument, the pointer
148 is set to the beginning of the argument and
150 is set to the length of the argument.
151 For escape sequences not taking an argument,
153 is set to the character after the end of the sequence and
162 in that case, the argument and the length are not returned.
164 For sequences taking an argument, the function
166 returns one of the following values:
171 taking an argument in standard form:
172 .Ic \ef[ , \ef( , \ef Ns Ar a .
173 Two-character arguments starting with the character
175 are reduced to one-character arguments by skipping the
177 More specific values are returned for the most commonly used arguments:
178 .Bl -column "argument" "ESCAPE_FONTITALIC"
179 .It argument Ta return value
180 .It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
181 .It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
182 .It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
183 .It Cm P Ta Dv ESCAPE_FONTPREV
184 .It Cm BI Ta Dv ESCAPE_FONTBI
186 .It Dv ESCAPE_SPECIAL
189 taking an argument delimited with the single quote character
190 and, as a special exception, the escape sequences
192 having an identifier, that is, those where the argument, in standard
193 form, directly follows the initial backslash:
194 .Ic \eC' , \e[ , \e( , \e Ns Ar a .
195 Note that the one-character argument short form can only be used for
196 argument characters that do not clash with escape sequence identifiers.
198 If the argument matches one of the forms described below under
200 that value is returned instead.
204 special character escape sequences can be rendered using the functions
211 .It Dv ESCAPE_UNICODE
212 Escape sequences of the same format as described above under
214 but with an argument of the forms
223 are hexadecimal digits and
227 As a special exception,
229 is set to the character after the
233 return value does not include the
237 Such Unicode character escape sequences can be rendered using the function
242 .It Dv ESCAPE_NUMBERED
245 followed by a delimited argument.
246 The delimiter character is arbitrary except that digits cannot be used.
247 If a digit is encountered instead of the opening delimiter, that
248 digit is considered to be the argument and the end of the sequence, and
252 Such ASCII character escape sequences can be rendered using the function
257 .It Dv ESCAPE_OVERSTRIKE
260 followed by an argument delimited by an arbitrary character.
262 .Bl -bullet -width 2n
266 followed by an argument in standard form or by an argument delimited
267 by the single quote character:
268 .Ic \es' , \es[ , \es( , \es Ns Ar a .
269 As a special exception, an optional
273 character is allowed after the
287 followed by an argument in standard form.
297 followed by an argument delimited by an arbitrary character.
308 followed by an argument delimited by a character that cannot occur
309 in numerical expressions.
310 However, if any character that can occur in numerical expressions
311 is found instead of a delimiter, the sequence is considered to end
312 with that character, and
317 Escape sequences taking an argument but not matching any of the above patterns.
318 In particular, that happens if the end of the logical input line
319 is reached before the end of the argument.
322 For sequences that do not take an argument, the function
324 returns one of the following values:
326 .It Dv ESCAPE_SKIPCHAR
329 .It Dv ESCAPE_NOSPACE
339 This function is implemented in
346 This function has been available since mandoc 1.11.2.
348 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
349 .An Ingo Schwarze Aq Mt schwarze@openbsd.org
351 The function doesn't cleanly distinguish between sequences that are
352 valid and supported, valid and ignored, valid and unsupported,
353 syntactically invalid, or undefined.
354 For sequences that are ignored or unsupported, it doesn't tell
355 whether that deficiency is likely to cause major formatting problems
356 and/or loss of document content.
357 The function is already rather complicated and still parses some
358 sequences incorrectly.
361 For these sequences, the list given below specifies a starting string
362 and either the length of the argument or an ending character.
363 The argument starts after the starting string.
364 In the former case, the sequence ends with the end of the argument.
365 In the latter case, the argument ends before the ending character,
366 and the sequence ends with the ending character.