1 files changed, 204 insertions, 0 deletions
diff --git a/text_cmds/sed/POSIX b/text_cmds/sed/POSIX
new file mode 100644
index 0000000..239acf8
--- /dev/null
+++ b/text_cmds/sed/POSIX
@@ -0,0 +1,204 @@
+#	@(#)POSIX	8.1 (Berkeley) 6/6/93
+# $FreeBSD$
+
+Comments on the IEEE P1003.2 Draft 12
+     Part 2: Shell and Utilities
+  Section 4.55: sed - Stream editor
+
+Diomidis Spinellis <dds@doc.ic.ac.uk>
+Keith Bostic <bostic@cs.berkeley.edu>
+
+In the following paragraphs, "wrong" usually means "inconsistent with
+historic practice", as most of the following comments refer to
+undocumented inconsistencies between the historical versions of sed and
+the POSIX 1003.2 standard.  All the comments are notes taken while
+implementing a POSIX-compatible version of sed, and should not be
+interpreted as official opinions or criticism towards the POSIX committee.
+All uses of "POSIX" refer to section 4.55, Draft 12 of POSIX 1003.2.
+
+ 1.	32V and BSD derived implementations of sed strip the text
+	arguments of the a, c and i commands of their initial blanks,
+	i.e.
+
+	#!/bin/sed -f
+	a\
+		foo\
+		\  indent\
+		bar
+
+	produces:
+
+	foo
+	  indent
+	bar
+
+	POSIX does not specify this behavior as the System V versions of
+	sed do not do this stripping.  The argument against stripping is
+	that it is difficult to write sed scripts that have leading blanks
+	if they are stripped.  The argument for stripping is that it is
+	difficult to write readable sed scripts unless indentation is allowed
+	and ignored, and leading whitespace is obtainable by entering a
+	backslash in front of it.  This implementation follows the BSD
+	historic practice.
+
+ 2.	Historical versions of sed required that the w flag be the last
+	flag to an s command as it takes an additional argument.  This
+	is obvious, but not specified in POSIX.
+
+ 3.	Historical versions of sed required that whitespace follow a w
+	flag to an s command.  This is not specified in POSIX.  This
+	implementation permits whitespace but does not require it.
+
+ 4.	Historical versions of sed permitted any number of whitespace
+	characters to follow the w command.  This is not specified in
+	POSIX.  This implementation permits whitespace but does not
+	require it.
+
+ 5.	The rule for the l command differs from historic practice.  Table
+	2-15 includes the various ANSI C escape sequences, including \\
+	for backslash.  Some historical versions of sed displayed two
+	digit octal numbers, too, not three as specified by POSIX.  POSIX
+	is a cleanup, and is followed by this implementation.
+
+ 6.	The POSIX specification for ! does not specify that for a single
+	command the command must not contain an address specification
+	whereas the command list can contain address specifications.  The
+	specification for ! implies that "3!/hello/p" works, and it never
+	has, historically.  Note,
+
+		3!{
+			/hello/p
+		}
+
+	does work.
+
+ 7.	POSIX does not specify what happens with consecutive ! commands
+	(e.g. /foo/!!!p).  Historic implementations allow any number of
+	!'s without changing the behaviour.  (It seems logical that each
+	one might reverse the behaviour.)  This implementation follows
+	historic practice.
+
+ 8.	Historic versions of sed permitted commands to be separated
+	by semi-colons, e.g. 'sed -ne '1p;2p;3q' printed the first
+	three lines of a file.  This is not specified by POSIX.
+	Note, the ; command separator is not allowed for the commands
+	a, c, i, w, r, :, b, t, # and at the end of a w flag in the s
+	command.  This implementation follows historic practice and
+	implements the ; separator.
+
+ 9.	Historic versions of sed terminated the script if EOF was reached
+	during the execution of the 'n' command, i.e.:
+
+	sed -e '
+	n
+	i\
+	hello
+	' </dev/null
+
+	did not produce any output.  POSIX does not specify this behavior.
+	This implementation follows historic practice.
+
+10.	Deleted.
+
+11.	Historical implementations do not output the change text of a c
+	command in the case of an address range whose first line number
+	is greater than the second (e.g. 3,1).  POSIX requires that the
+	text be output.  Since the historic behavior doesn't seem to have
+	any particular purpose, this implementation follows the POSIX
+	behavior.
+
+12.	POSIX does not specify whether address ranges are checked and
+	reset if a command is not executed due to a jump.  The following
+	program will behave in different ways depending on whether the
+	'c' command is triggered at the third line, i.e. will the text
+	be output even though line 3 of the input will never logically
+	encounter that command.
+
+	2,4b
+	1,3c\
+		text
+
+	Historic implementations did not output the text in the above
+	example.  Therefore it was believed that a range whose second
+	address was never matched extended to the end of the input.
+	However, the current practice adopted by this implementation,
+	as well as by those from GNU and SUN, is as follows:  The text
+	from the 'c' command still isn't output because the second address
+	isn't actually matched; but the range is reset after all if its
+	second address is a line number.  In the above example, only the
+	first line of the input will be deleted.
+
+13.	Historical implementations allow an output suppressing #n at the
+	beginning of -e arguments as well as in a script file.  POSIX
+	does not specify this.  This implementation follows historical
+	practice.
+
+14.	POSIX does not explicitly specify how sed behaves if no script is
+	specified.  Since the sed Synopsis permits this form of the command,
+	and the language in the Description section states that the input
+	is output, it seems reasonable that it behave like the cat(1)
+	command.  Historic sed implementations behave differently for "ls |
+	sed", where they produce no output, and "ls | sed -e#", where they
+	behave like cat.  This implementation behaves like cat in both cases.
+
+15.	The POSIX requirement to open all w files at the beginning makes
+	sed behave nonintuitively when the w commands are preceded by
+	addresses or are within conditional blocks.  This implementation
+	follows historic practice and POSIX, by default, and provides the
+	-a option which opens the files only when they are needed.
+
+16.	POSIX does not specify how escape sequences other than \n and \D
+	(where D is the delimiter character) are to be treated.  This is
+	reasonable, however, it also doesn't state that the backslash is
+	to be discarded from the output regardless.  A strict reading of
+	POSIX would be that "echo xyz | sed s/./\a" would display "\ayz".
+	As historic sed implementations always discarded the backslash,
+	this implementation does as well.
+
+17.	POSIX specifies that an address can be "empty".  This implies
+	that constructs like ",d" or "1,d" and ",5d" are allowed.  This
+	is not true for historic implementations or this implementation
+	of sed.
+
+18.	The b t and : commands are documented in POSIX to ignore leading
+	white space, but no mention is made of trailing white space.
+	Historic implementations of sed assigned different locations to
+	the labels "x" and "x ".  This is not useful, and leads to subtle
+	programming errors, but it is historic practice and changing it
+	could theoretically break working scripts.  This implementation
+	follows historic practice.
+
+19.	Although POSIX specifies that reading from files that do not exist
+	from within the script must not terminate the script, it does not
+	specify what happens if a write command fails.  Historic practice
+	is to fail immediately if the file cannot be opened or written.
+	This implementation follows historic practice.
+
+20.	Historic practice is that the \n construct can be used for either
+	string1 or string2 of the y command.  This is not specified by
+	POSIX.  This implementation follows historic practice.
+
+21.	Deleted.
+
+22.	Historic implementations of sed ignore the RE delimiter characters
+	within character classes.  This is not specified in POSIX.  This
+	implementation follows historic practice.
+
+23.	Historic implementations handle empty RE's in a special way: the
+	empty RE is interpreted as if it were the last RE encountered,
+	whether in an address or elsewhere.  POSIX does not document this
+	behavior.  For example the command:
+
+		sed -e /abc/s//XXX/
+
+	substitutes XXX for the pattern abc.  The semantics of "the last
+	RE" can be defined in two different ways:
+
+	1. The last RE encountered when compiling (lexical/static scope).
+	2. The last RE encountered while running (dynamic scope).
+
+	While many historical implementations fail on programs depending
+	on scope differences, the SunOS version exhibited dynamic scope
+	behaviour.  This implementation does dynamic scoping, as this seems
+	the most useful and in order to remain consistent with historical
+	practice.