包含 U+001A 的 XML 文档的编码 [英] Encoding for an XML document containing U+001A
问题描述
我有一个 XML 文档,它是根据人们从各种地方复制/粘贴的一些内容生成的(尽管主要是 Word 文档).
I have an XML document that's being generated from some content that people are copy/pasting from all sorts of places (Word documents mostly though).
看起来像这样:
<?xml version="1.0" encoding="UTF-8"?>
<response>
<data> <![CDATA[
(whatever was pasted)
]]></data>
</response>
我一直使用 UTF-8
或 iso-8859-1
的编码,但现在有人走了并复制/粘贴了 unicode 字符 U+001A
(0x1a
) 并且我找不到可以接受它的编码.无论使用何种编码,我将 XML 文件放入的所有内容(例如 Firefox、Internet Explorer、XML Spy)都说它无效.
I've always used an encoding of UTF-8
or iso-8859-1
, but now someone's gone and copy/pasted the unicode character U+001A
(0x1a
) and I can't find an encoding that will accept it. Everything I put the XML file into (e.g. Firefox, Internet Explorer, XML Spy) all say it's invalid, regardless of the kind of encoding used.
是否可以使用一种编码来防止文件翻倒,或者我是否需要开始一个一个地去除所有这些字符?
Is there an encoding I can use that will stop the file from falling over, or do I need to start stripping all these characters out one by one?
推荐答案
U+001A 不是 XML 文档中的有效字符.根据规范的有效字符范围是:
U+001A is not a valid character in an XML document. The valid range of characters according to the specification is:
Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
这篇关于包含 U+001A 的 XML 文档的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!