diff options
author | 2014-10-26 17:12:03 +0000 | |
---|---|---|
committer | 2014-10-26 17:12:03 +0000 | |
commit | 5faa62e2445541401f9bee1667d1cd2b2e443e53 (patch) | |
tree | fd737f26543e4c9e9e08db9bc3b51103c61736a1 /html.c | |
parent | eb1d4be7915b314c92a4c377c4a09a06e811fc57 (diff) | |
download | mandoc-5faa62e2445541401f9bee1667d1cd2b2e443e53.tar.gz mandoc-5faa62e2445541401f9bee1667d1cd2b2e443e53.tar.zst mandoc-5faa62e2445541401f9bee1667d1cd2b2e443e53.zip |
Improve -Tascii output for Unicode escape sequences: For the first 512
code points, provide ASCII approximations. This is already much better
than what groff does, which prints nothing for most code points.
A few minor fixes while here:
* Handle Unicode escape sequences in the ASCII range.
* In case of errors, use the REPLACEMENT CHARACTER U+FFFD for -Tutf8
and the string "<?>" for -Tascii output.
* Handle all one-character escape sequences in mchars_spec2{cp,str}()
and remove the workarounds on the higher level.
Diffstat (limited to 'html.c')
-rw-r--r-- | html.c | 16 |
1 files changed, 13 insertions, 3 deletions
@@ -1,4 +1,4 @@ -/* $Id: html.c,v 1.176 2014/10/10 15:26:29 schwarze Exp $ */ +/* $Id: html.c,v 1.177 2014/10/26 17:12:03 schwarze Exp $ */ /* * Copyright (c) 2008-2011, 2014 Kristaps Dzonsons <kristaps@bsd.lv> * Copyright (c) 2011, 2012, 2013, 2014 Ingo Schwarze <schwarze@openbsd.org> @@ -437,8 +437,18 @@ print_encode(struct html *h, const char *p, int norecurse) case ESCAPE_UNICODE: /* Skip past "u" header. */ c = mchars_num2uc(seq + 1, len - 1); - if ('\0' != c) - printf("&#x%x;", c); + + /* + * XXX Security warning: + * For now, forbid Unicode obfuscation of ASCII + * characters. An audit of the callers is + * required before this can be removed. + */ + + if (c < 0x80) + c = 0xFFFD; + + printf("&#x%x;", c); break; case ESCAPE_NUMBERED: c = mchars_num2char(seq, len); |