包含 U+001A 的 XML 文档的编码 [英] Encoding for an XML document containing U+001A

查看:29
本文介绍了包含 U+001A 的 XML 文档的编码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 XML 文档,它是根据人们从各种地方复制/粘贴的一些内容生成的(尽管主要是 Word 文档).

I have an XML document that's being generated from some content that people are copy/pasting from all sorts of places (Word documents mostly though).

看起来像这样:

<?xml version="1.0" encoding="UTF-8"?>
<response>
    <data> <![CDATA[
         (whatever was pasted)
    ]]></data>        
</response>

我一直使用 UTF-8iso-8859-1 的编码,但现在有人走了并复制/粘贴了 unicode 字符 U+001A (0x1a) 并且我找不到可以接受它的编码.无论使用何种编码,我将 XML 文件放入的所有内容(例如 Firefox、Internet Explorer、XML Spy)都说它无效.

I've always used an encoding of UTF-8 or iso-8859-1, but now someone's gone and copy/pasted the unicode character U+001A (0x1a) and I can't find an encoding that will accept it. Everything I put the XML file into (e.g. Firefox, Internet Explorer, XML Spy) all say it's invalid, regardless of the kind of encoding used.

是否可以使用一种编码来防止文件翻倒,或者我是否需要开始一个一个地去除所有这些字符?

Is there an encoding I can use that will stop the file from falling over, or do I need to start stripping all these characters out one by one?

推荐答案

U+001A 不是 XML 文档中的有效字符.根据规范的有效字符范围是:

U+001A is not a valid character in an XML document. The valid range of characters according to the specification is:

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

这篇关于包含 U+001A 的 XML 文档的编码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆