X-Git-Url: https://git.cameronkatri.com/mandoc.git/blobdiff_plain/a8bf4b0fd555d5f2c02407d5e3f97a00d18c6296..dcd90e7955626a38ed974eeba3b599fb534e7b62:/mandoc_char.7?ds=sidebyside

diff --git a/mandoc_char.7 b/mandoc_char.7
index 20b9f947..8d835665 100644
--- a/mandoc_char.7
+++ b/mandoc_char.7
@@ -1,6 +1,8 @@
-.\"	$Id: mandoc_char.7,v 1.43 2011/04/20 22:50:22 kristaps Exp $
+.\"	$Id: mandoc_char.7,v 1.56 2013/12/26 17:23:42 schwarze Exp $
 .\"
-.\" Copyright (c) 2009 Kristaps Dzonsons <kristaps@bsd.lv>
+.\" Copyright (c) 2003 Jason McIntyre <jmc@openbsd.org>
+.\" Copyright (c) 2009, 2010, 2011 Kristaps Dzonsons <kristaps@bsd.lv>
+.\" Copyright (c) 2011 Ingo Schwarze <schwarze@openbsd.org>
 .\"
 .\" Permission to use, copy, modify, and distribute this software for any
 .\" purpose with or without fee is hereby granted, provided that the above
@@ -14,26 +16,169 @@
 .\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 .\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 .\"
-.Dd $Mdocdate: April 20 2011 $
+.Dd $Mdocdate: December 26 2013 $
 .Dt MANDOC_CHAR 7
 .Os
 .Sh NAME
 .Nm mandoc_char
 .Nd mandoc special characters
 .Sh DESCRIPTION
-This page documents the special characters and predefined strings accepted by
+This page documents the
+.Xr roff 7
+escape sequences accepted by
 .Xr mandoc 1
-to format
+to represent special characters in
 .Xr mdoc 7
 and
 .Xr man 7
 documents.
 .Pp
-Both
-.Xr mdoc 7
-and
-.Xr man 7
-encode special characters with
+The rendering depends on the
+.Xr mandoc 1
+output mode; in ASCII output, most characters are completely
+unintelligible.
+For that reason, using any of the special characters documented here,
+except those discussed in the
+.Sx DESCRIPTION ,
+is strongly discouraged; they are supported merely for backwards
+compatibility with existing documents.
+.Pp
+In particular, in English manual pages, do not use special-character
+escape sequences to represent national language characters in author
+names; instead, provide ASCII transcriptions of the names.
+.Ss Dashes and Hyphens
+In typography there are different types of dashes of various width:
+the hyphen (-),
+the minus sign (\-),
+the en-dash (\(en),
+and the em-dash (\(em).
+.Pp
+Hyphens are used for adjectives;
+to separate the two parts of a compound word;
+or to separate a word across two successive lines of text.
+The hyphen does not need to be escaped:
+.Bd -unfilled -offset indent
+blue-eyed
+lorry-driver
+.Ed
+.Pp
+The mathematical minus sign is used for negative numbers or subtraction.
+It should be written as
+.Sq \e- :
+.Bd -unfilled -offset indent
+a = 3 \e- 1;
+b = \e-2;
+.Ed
+.Pp
+The en-dash is used to separate the two elements of a range,
+or can be used the same way as an em-dash.
+It should be written as
+.Sq \e(en :
+.Bd -unfilled -offset indent
+pp. 95\e(en97.
+Go away \e(en or else!
+.Ed
+.Pp
+The em-dash can be used to show an interruption
+or can be used the same way as colons, semi-colons, or parentheses.
+It should be written as
+.Sq \e(em :
+.Bd -unfilled -offset indent
+Three things \e(em apples, oranges, and bananas.
+This is not that \e(em rather, this is that.
+.Ed
+.Pp
+Note:
+hyphens, minus signs, and en-dashes look identical under normal ASCII output.
+Other formats, such as PostScript, render them correctly,
+with differing widths.
+.Ss Spaces
+To separate words in normal text, for indenting and alignment
+in literal context, and when none of the following special cases apply,
+just use the normal space character
+.Pq Sq \  .
+.Pp
+When filling text, output lines may be broken between words, i.e. at space
+characters.
+To prevent a line break between two particular words,
+use the unpaddable non-breaking space escape sequence
+.Pq Sq \e\ \&
+instead of the normal space character.
+For example, the input string
+.Dq number\e\ 1
+will be kept together as
+.Dq number\ 1
+on the same output line.
+.Pp
+On request and macro lines, the normal space character serves as an
+argument delimiter.
+To include whitespace into arguments, quoting is usually the best choice;
+see the MACRO SYNTAX section in
+.Xr roff 7 .
+In some cases, using the non-breaking space escape sequence
+.Pq Sq \e\ \&
+may be preferable.
+.Pp
+To escape macro names and to protect whitespace at the end
+of input lines, the zero-width space
+.Pq Sq \e&
+is often useful.
+For example, in
+.Xr mdoc 7 ,
+a normal space character can be displayed in single quotes in either
+of the following ways:
+.Pp
+.Dl .Sq \(dq \(dq
+.Dl .Sq \e \e&
+.Ss Quotes
+On request and macro lines, the double-quote character
+.Pq Sq \(dq
+is handled specially to allow quoting.
+One way to prevent this special handling is by using the
+.Sq \e(dq
+escape sequence.
+.Pp
+Note that on text lines, literal double-quote characters can be used
+verbatim.
+All other quote-like characters can be used verbatim as well,
+even on request and macro lines.
+.Ss Periods
+The period
+.Pq Sq \&.
+is handled specially at the beginning of an input line,
+where it introduces a
+.Xr roff 7
+request or a macro, and when appearing alone as a macro argument in
+.Xr mdoc 7 .
+In such situations, prepend a zero-width space
+.Pq Sq \e&.
+to make it behave like normal text.
+.Pp
+Do not use the
+.Sq \e.
+escape sequence.
+It does not prevent special handling of the period.
+.Ss Backslashes
+To include a literal backslash
+.Pq Sq \e
+into the output, use the
+.Pq Sq \ee
+escape sequence.
+.Pp
+Note that doubling it
+.Pq Sq \e\e
+is not the right way to output a backslash.
+Because
+.Xr mandoc 1
+does not implement full
+.Xr roff 7
+functionality, it may work with
+.Xr mandoc 1 ,
+but it may have weird effects on complete
+.Xr roff 7
+implementations.
+.Sh SPECIAL CHARACTERS
+Special characters are encoded as
 .Sq \eX
 .Pq for a one-character escape ,
 .Sq \e(XX
@@ -41,53 +186,26 @@ encode special characters with
 and
 .Sq \e[N]
 .Pq N-character .
-One may generalise
-.Sq \e(XX
-as
-.Sq \e[XX]
-and
-.Sq \eX
-as
-.Sq \e[X] .
-Predefined strings are functionally similar to special characters, using
-.Sq \e*X
-.Pq for a one-character escape ,
-.Sq \e*(XX
-.Pq two-character ,
-and
-.Sq \e*[N]
-.Pq N-character .
-One may generalise
-.Sq \e*(XX
-as
-.Sq \e*[XX]
-and
-.Sq \e*X
-as
-.Sq \e*[X] .
-.Pp
-Note that each output mode will have a different rendering of the
-characters.
-It's guaranteed that each input symbol will correspond to a
-(more or less) meaningful output rendering, regardless the mode.
-.Sh SPECIAL CHARACTERS
-These are the preferred input symbols for producing special characters.
+For details, see the
+.Em Special Characters
+subsection of the
+.Xr roff 7
+manual.
 .Pp
 Spacing:
-.Bl -column -compact -offset indent "Input" "Description"
+.Bl -column "Input" "Description" -offset indent -compact
 .It Em Input Ta Em Description
-.It \e~      Ta non-breaking, non-collapsing space
-.It \e       Ta breaking, non-collapsing n-width space
-.It \e^      Ta zero-width space
-.It \e%      Ta zero-width space
+.It Sq \e\ \& Ta unpaddable non-breaking space
+.It \e~      Ta paddable non-breaking space
+.It \e0      Ta unpaddable, breaking digit-width space
+.It \e|      Ta one-sixth \e(em narrow space, zero width in nroff mode
+.It \e^      Ta one-twelfth \e(em half-narrow space, zero width in nroff
 .It \e&      Ta zero-width space
-.It \e|      Ta zero-width space
-.It \e0      Ta breaking, non-collapsing digit-width space
-.It \ec      Ta removes any trailing space (if applicable)
+.It \e%      Ta zero-width space allowing hyphenation
 .El
 .Pp
 Lines:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(ba    Ta \(ba        Ta bar
 .It \e(br    Ta \(br        Ta box rule
@@ -99,7 +217,7 @@ Lines:
 .El
 .Pp
 Text markers:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(ci    Ta \(ci        Ta circle
 .It \e(bu    Ta \(bu        Ta bullet
@@ -118,7 +236,7 @@ Text markers:
 .El
 .Pp
 Legal symbols:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(co    Ta \(co        Ta copyright
 .It \e(rg    Ta \(rg        Ta registered
@@ -126,7 +244,7 @@ Legal symbols:
 .El
 .Pp
 Punctuation:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(em    Ta \(em        Ta em-dash
 .It \e(en    Ta \(en        Ta en-dash
@@ -138,7 +256,7 @@ Punctuation:
 .El
 .Pp
 Quotes:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(Bq    Ta \(Bq        Ta right low double-quote
 .It \e(bq    Ta \(bq        Ta right low single-quote
@@ -155,7 +273,7 @@ Quotes:
 .El
 .Pp
 Brackets:
-.Bl -column -compact -offset indent "xxbracketrightbpx" Rendered Description
+.Bl -column "xxbracketrightbpx" Rendered Description -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(lB    Ta \(lB        Ta left bracket
 .It \e(rB    Ta \(rB        Ta right bracket
@@ -194,7 +312,7 @@ Brackets:
 .El
 .Pp
 Arrows:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(<-    Ta \(<-        Ta left arrow
 .It \e(->    Ta \(->        Ta right arrow
@@ -211,7 +329,7 @@ Arrows:
 .El
 .Pp
 Logical:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(AN    Ta \(AN        Ta logical and
 .It \e(OR    Ta \(OR        Ta logical or
@@ -226,7 +344,7 @@ Logical:
 .El
 .Pp
 Mathematical:
-.Bl -column -compact -offset indent "xxcoproductxx" "Rendered" "Description"
+.Bl -column "xxcoproductxx" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(pl    Ta \(pl        Ta plus
 .It \e(mi    Ta \(mi        Ta minus
@@ -288,10 +406,13 @@ Mathematical:
 .It \e(Re    Ta \(Re        Ta real
 .It \e(pd    Ta \(pd        Ta partial differential
 .It \e(-h    Ta \(-h        Ta Planck constant over 2\(*p
+.It \e[12]   Ta \[12]       Ta one-half
+.It \e[14]   Ta \[14]       Ta one-fourth
+.It \e[34]   Ta \[34]       Ta three-fourths
 .El
 .Pp
 Ligatures:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(ff    Ta \(ff        Ta ff ligature
 .It \e(fi    Ta \(fi        Ta fi ligature
@@ -308,7 +429,7 @@ Ligatures:
 .El
 .Pp
 Accents:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(a"    Ta \(a"        Ta Hungarian umlaut
 .It \e(a-    Ta \(a-        Ta macron
@@ -330,7 +451,7 @@ Accents:
 .El
 .Pp
 Accented letters:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e('A    Ta \('A        Ta acute A
 .It \e('E    Ta \('E        Ta acute E
@@ -390,7 +511,7 @@ Accented letters:
 .El
 .Pp
 Special letters:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(-D    Ta \(-D        Ta Eth
 .It \e(Sd    Ta \(Sd        Ta eth
@@ -401,7 +522,7 @@ Special letters:
 .El
 .Pp
 Currency:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(Do    Ta \(Do        Ta dollar
 .It \e(ct    Ta \(ct        Ta cent
@@ -414,7 +535,7 @@ Currency:
 .El
 .Pp
 Units:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(de    Ta \(de        Ta degree
 .It \e(%0    Ta \(%0        Ta per-thousand
@@ -424,7 +545,7 @@ Units:
 .El
 .Pp
 Greek letters:
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+.Bl -column "Input" "Rendered" "Description" -offset indent -compact
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e(*A    Ta \(*A        Ta Alpha
 .It \e(*B    Ta \(*B        Ta Beta
@@ -489,7 +610,20 @@ for use, as they differ across implementations.
 Manuals using these predefined strings are almost certainly not
 portable.
 .Pp
-.Bl -column -compact -offset indent "Input" "Rendered" "Description"
+Their syntax is similar to special characters, using
+.Sq \e*X
+.Pq for a one-character escape ,
+.Sq \e*(XX
+.Pq two-character ,
+and
+.Sq \e*[N]
+.Pq N-character .
+For details, see the
+.Em Predefined Strings
+subsection of the
+.Xr roff 7
+manual.
+.Bl -column "Input" "Rendered" "Description" -offset indent
 .It Em Input Ta Em Rendered Ta Em Description
 .It \e*(Ba   Ta \*(Ba       Ta vertical bar
 .It \e*(Ne   Ta \*(Ne       Ta not equal
@@ -520,6 +654,25 @@ portable.
 .It \e*(Px   Ta \*(Px       Ta POSIX standard name
 .It \e*(Ai   Ta \*(Ai       Ta ANSI standard name
 .El
+.Sh UNICODE CHARACTERS
+The escape sequences
+.Pp
+.Dl \e[uXXXX] and \eC'uXXXX'
+.Pp
+are interpreted as Unicode codepoints.
+The codepoint must be in the range above U+0080 and less than U+10FFFF.
+For compatibility, the hexadecimal digits
+.Sq A
+to
+.Sq F
+must be given as uppercase characters,
+and points must be zero-padded to four characters; if
+greater than four characters, no zero padding is allowed.
+Unicode surrogates are not allowed.
+.\" .Pp
+.\" Unicode glyphs attenuate to the
+.\" .Sq \&?
+.\" character if invalid or not rendered by current output media.
 .Sh NUMBERED CHARACTERS
 For backward compatibility with existing manuals,
 .Xr mandoc 1
@@ -536,12 +689,15 @@ For example, do not use \eN'34', use \e(dq, or even the plain
 .Sq \(dq
 character where possible.
 .Sh COMPATIBILITY
-This section documents compatibility between mandoc and other other
+This section documents compatibility between mandoc and other
 troff implementations, at this time limited to GNU troff
 .Pq Qq groff .
 .Pp
 .Bl -dash -compact
 .It
+The \eN\(aq\(aq escape sequence is limited to printable characters; in
+groff, it accepts arbitrary character numbers.
+.It
 In
 .Fl T Ns Cm ascii ,
 the
@@ -569,16 +725,19 @@ from mandoc either because they are poorly documented or they have no
 known representation.
 .El
 .Sh SEE ALSO
-.Xr mandoc 1
+.Xr mandoc 1 ,
+.Xr man 7 ,
+.Xr mdoc 7 ,
+.Xr roff 7
 .Sh AUTHORS
 The
 .Nm
 manual page was written by
-.An Kristaps Dzonsons Aq kristaps@bsd.lv .
+.An Kristaps Dzonsons Aq Mt kristaps@bsd.lv .
 .Sh CAVEATS
-The
+The predefined string
 .Sq \e*(Ba
-escape mimics the behaviour of the
+mimics the behaviour of the
 .Sq \&|
 character in
 .Xr mdoc 7 ;