1 .\" $Id: mandoc_escape.3,v 1.2 2014/10/28 14:06:31 schwarze Exp $
3 .\" Copyright (c) 2014 Ingo Schwarze <schwarze@openbsd.org>
5 .\" Permission to use, copy, modify, and distribute this software for any
6 .\" purpose with or without fee is hereby granted, provided that the above
7 .\" copyright notice and this permission notice appear in all copies.
9 .\" THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
10 .\" WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
11 .\" MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
12 .\" ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
13 .\" WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
14 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
15 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
17 .Dd $Mdocdate: October 28 2014 $
22 .Nd parse roff escape sequences
30 .Fa "const char **end"
31 .Fa "const char **start"
39 An escape sequence consists of
40 .Bl -dash -compact -width 2n
42 an initial backslash character
45 a single ASCII character called the escape sequence identifier,
47 and, with only a few exceptions, an argument.
50 Arguments can be given in the following forms; some escape sequence
51 identifiers only accept some of these forms as specified below.
52 The first three forms are called the standard forms.
54 .It \&In brackets: Ic \&[ Ns Ar argument Ns Ic \&]
55 The argument starts after the initial
59 and the escape sequence ends with the final
61 .It Two-character argument short form: Ic \&( Ns Ar ar
62 This form can only be used for arguments
63 consisting of exactly two characters.
64 It has the same effect as
65 .Ic \&[ Ns Ar ar Ns Ic \&] .
66 .It One-character argument short form: Ar a
67 This form can only be used for arguments
68 consisting of exactly one character.
69 It has the same effect as
70 .Ic \&[ Ns Ar a Ns Ic \&] .
71 .It Delimited form: Ar C Ns Ar argument Ns Ar C
72 The argument starts after the initial delimiter character
74 ends before the next occurrence of the delimiter character
76 and the escape sequence ends with that second
78 Some escape sequences allow arbitrary characters
80 as quoting characters, some restrict the range of characters
81 that can be used as quoting characters.
86 is expected to point to the escape sequence identifier.
87 The values passed in as
91 are ignored and overwritten.
93 By design, this function cannot handle those
95 escape sequences that require in-place expansion, in particular
102 and numerical expression control
106 a private preprocessor function called from
114 .Bl -dash -compact -width 2n
116 recursively by itself, because some escape sequence arguments can
117 in turn contain other escape sequences,
119 for error detection internally by the
126 above all externally by the
128 formatting modules, in particular
132 for formatting purposes, see the files
137 and rarely externally by high-level utilities using the mandoc library,
140 to purge escape sequences from text.
143 Upon function return, the pointer
145 is set to the character after the end of the escape sequence,
146 such that the calling higher-level parser can easily continue.
148 For escape sequences taking an argument, the pointer
150 is set to the beginning of the argument and
152 is set to the length of the argument.
153 For escape sequences not taking an argument,
155 is set to the character after the end of the sequence and
164 in that case, the argument and the length are not returned.
166 For sequences taking an argument, the function
168 returns one of the following values:
173 taking an argument in standard form:
174 .Ic \ef[ , \ef( , \ef Ns Ar a .
175 Two-character arguments starting with the character
177 are reduced to one-character arguments by skipping the
179 More specific values are returned for the most commonly used arguments:
180 .Bl -column "argument" "ESCAPE_FONTITALIC"
181 .It argument Ta return value
182 .It Cm R No or Cm 1 Ta Dv ESCAPE_FONTROMAN
183 .It Cm I No or Cm 2 Ta Dv ESCAPE_FONTITALIC
184 .It Cm B No or Cm 3 Ta Dv ESCAPE_FONTBOLD
185 .It Cm P Ta Dv ESCAPE_FONTPREV
186 .It Cm BI Ta Dv ESCAPE_FONTBI
188 .It Dv ESCAPE_SPECIAL
191 taking an argument delimited with the single quote character
192 and, as a special exception, the escape sequences
194 having an identifier, that is, those where the argument, in standard
195 form, directly follows the initial backslash:
196 .Ic \eC' , \e[ , \e( , \e Ns Ar a .
197 Note that the one-character argument short form can only be used for
198 argument characters that do not clash with escape sequence identifiers.
200 If the argument matches one of the forms described below under
202 that value is returned instead.
206 special character escape sequences can be rendered using the functions
213 .It Dv ESCAPE_UNICODE
214 Escape sequences of the same format as described above under
216 but with an argument of the forms
225 are hexadecimal digits and
229 As a special exception,
231 is set to the character after the
235 return value does not include the
239 Such Unicode character escape sequences can be rendered using the function
244 .It Dv ESCAPE_NUMBERED
247 followed by a delimited argument.
248 The delimiter character is arbitrary except that digits cannot be used.
249 If a digit is encountered instead of the opening delimiter, that
250 digit is considered to be the argument and the end of the sequence, and
254 Such ASCII character escape sequences can be rendered using the function
260 .Bl -bullet -width 2n
264 followed by an argument in standard form or by an argument delimited
265 by the single quote character:
266 .Ic \es' , \es[ , \es( , \es Ns Ar a .
267 As a special exception, an optional
271 character is allowed after the
285 followed by an argument in standard form.
296 followed by an argument delimited by an arbitrary character.
307 followed by an argument delimited by a character that cannot occur
308 in numerical expressions.
309 However, if any character that can occur in numerical expressions
310 is found instead of a delimiter, the sequence is considered to end
311 with that character, and
316 Escape sequences taking an argument but not matching any of the above patterns.
317 In particular, that happens if the end of the logical input line
318 is reached before the end of the argument.
321 For sequences that do not take an argument, the function
323 returns one of the following values:
325 .It Dv ESCAPE_SKIPCHAR
328 .It Dv ESCAPE_NOSPACE
338 This function is implemented in
345 This function has been available since mandoc 1.11.2.
347 .An Kristaps Dzonsons Aq Mt kristaps@bsd.lv
348 .An Ingo Schwarze Aq Mt schwarze@openbsd.org
350 The function doesn't cleanly distinguish between sequences that are
351 valid and supported, valid and ignored, valid and unsupported,
352 syntactically invalid, or undefined.
353 For sequences that are ignored or unsupported, it doesn't tell
354 whether that deficiency is likely to cause major formatting problems
355 and/or loss of document content.
356 The function is already rather complicated and still parses some
357 sequences incorrectly.
360 For these sequences, the list given below specifies a starting string
361 and either the length of the argument or an ending character.
362 The argument starts after the starting string.
363 In the former case, the sequence ends with the end of the argument.
364 In the latter case, the argument ends before the ending character,
365 and the sequence ends with the ending character.