使用 XML1.1 解析 unicode 字符(0x2) [英] Parsing unicode character (0x2) using XML1.1

查看:22
本文介绍了使用 XML1.1 解析 unicode 字符(0x2)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的 Java 应用程序中,我需要解析一个 XML 文档,该文档在 CDATA 中包含控制字符 0x2.

In my Java application, I need to parse an XML document that contains control character 0x2 inside CDATA.

我尝试了几种方法,但无法通过.我想避免任何形式的编码.

I tried few ways but coudnt get through. I want to avoid any sort of encoding.

XML1.1有什么办法吗?

Is there any way in XML1.1?

推荐答案

我需要在 CDATA 中解析包含控制字符 0x2 的 xml

I need to parse xml that contains control character 0x2 inside CDATA

那不是 XML.任何地方的原始控制字符 U+0002 意味着它的格式不正确,因此不是 XML 文档.

That's not XML, then. A raw control character U+0002 anywhere means it's not well-formed and hence not an XML document.

仅在 XML 1.1 中,可以包含编码为字符引用的控制字符.因此,您可能已尝试通过在解析之前将 x02 的字符串替换为  来修复它.但是,您不能将字符引用放在 CDATA 部分中,所以这也行不通.

In XML 1.1 only, one may include control characters encoded as character reference. So you might have tried to fix it up by doing a string replace for x02 with  before parsing. However, you can't put character references in CDATA sections, so that's not going to fly either.

如果您绝对确定每个杂散的 U+0002 字符都在 CDATA 部分内,您可能可以在短期内修复它,方法是将每个字符替换为:

edit: you could probably fix it in the short-term, if you are absolutely sure that every stray U+0002 character is inside a CDATA section, by replacing each with:

]]>&#2;<![CDATA[

但是,这是超级shonky.首先需要修复生成错误 XML 的任何内容.去踢负责创建它的人!

However this is super-shonky. Whatever generated the faulty XML in the first place needs to be fixed. Go kick the person responsible for creating it!

这篇关于使用 XML1.1 解析 unicode 字符(0x2)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆