-.\" $Id: preconv.1,v 1.1 2011/05/26 12:01:14 kristaps Exp $
+.\" $Id: preconv.1,v 1.5 2011/08/18 08:58:44 kristaps Exp $
.\"
.\" Copyright (c) 2011 Kristaps Dzonsons <kristaps@bsd.lv>
.\"
.\" ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
.\" OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
.\"
-.Dd $Mdocdate: May 26 2011 $
+.Dd $Mdocdate: August 18 2011 $
.Dt PRECONV 1
.Os
.Sh NAME
.Nm preconv
-.Nd recodes multibyte UNIX manuals as mandoc input
+.Nd recode multibyte UNIX manuals
.Sh SYNOPSIS
.Nm preconv
.Op Fl D Ar enc
.Ux
manual files into
.Xr mandoc 1
+.Po
+or other troff system supporting the
+.Sq \e[uNNNN]
+escape sequence
+.Pc
input.
Its arguments are as follows:
.Bl -tag -width Ds
.It Fl D Ar enc
The default encoding.
-This is case-insensitive.
-See
-.Sx Algorithm
-and
-.Sx Encodings .
.It Fl e Ar enc
The document's encoding.
-This is case-insensitive.
-See
-.Sx Algorithm
-and
-.Sx Encodings .
.It Ar file
The input file.
.El
is not provided,
.Nm
accepts standard input.
-Output is written to standard output.
-Unicode characters in the ASCII range are printed as regular ASCII
-characters; those above this range are printed using the
+See
+.Sx Algorithm
+for encoding choice.
+.Pp
+The recoded input is written to standard output: Unicode characters in
+the ASCII range are printed as regular ASCII characters, while those
+above this range are printed using the
.Sq \e[uNNNN]
format documented in
.Xr mandoc_char 7 .
.Pp
If input bytes are improperly formed in the current encoding, they're
passed unmodified to standard output.
-.Ss Encodings
-The
+For some encodings, such as UTF-8, unrecoverable input sequences will
+cause
.Nm
-utility accepts the
-.Ar utf\-8 ,
-.Ar us\-ascii ,
-and
-.Ar latin\-1
-encodings as arguments to
-.Fl D Ar enc
-or
-.Fl e Ar enc .
+to stop processing and exit.
.Ss Algorithm
An encoding is chosen according to the following steps:
.Bl -enum
From the argument passed to
.Fl e Ar enc .
.It
-If a BOM exists, utf\-8 encoding is selected.
+If a BOM exists, UTF\-8 encoding is selected.
+.It
+From the coding tags parsed from
+.Qq File Variables
+on the first two lines of input.
+A file variable is an input line of the form
+.Pp
+.Dl \%.\e\(dq -*- key: val [; key: val ]* -*-
+.Pp
+A coding tag variable is where
+.Cm key
+is
+.Qq coding
+and
+.Cm val
+is the name of the encoding.
+A typical file variable with a coding tag is
+.Pp
+.Dl \%.\e\(dq -*- mode: troff; coding: utf-8 -*-
.It
From the argument passed to
.Fl D Ar enc .
.It
If all else fails, Latin\-1 is used.
.El
+.Pp
+The
+.Nm
+utility recognises the UTF\-8, us\-ascii, and latin\-1 encodings as
+passed to the
+.Fl e
+and
+.Fl D
+arguments, or as coding tags.
+Encodings are matched case-insensitively.
.\" .Sh IMPLEMENTATION NOTES
.\" Not used in OpenBSD.
.\" .Sh RETURN VALUES
.\" .Sh FILES
.Sh EXIT STATUS
.Ex -std
-.\" .Sh EXAMPLES
+.Sh EXAMPLES
+Explicitly page a UTF\-8 manual
+.Pa foo.1
+in the current locale:
+.Pp
+.Dl $ preconv \-e utf\-8 foo.1 | mandoc -Tlocale | less
.\" .Sh DIAGNOSTICS
.\" For sections 1, 4, 6, 7, & 8 only.
.\" .Sh ERRORS
The
.Nm
utility was written by
-.An Kristaps Dzonsons Aq kristaps@bsd.lv .
+.An Kristaps Dzonsons ,
+.Mt kristaps@bsd.lv .
.\" .Sh CAVEATS
.\" .Sh BUGS
.\" .Sh SECURITY CONSIDERATIONS