'&'的XML解析问题在元素文本中 [英] XML parsing issue with '&' in element text

查看:137
本文介绍了'&'的XML解析问题在元素文本中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下代码:

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(inputXml)));

并且抛出了解析步骤:

SAXParseException: The entity name must immediately follow 
                   the '&' in the entity reference

由于我的 inputXml 中的以下'&':

due to the following '&' in my inputXml:

<Line1>Day & Night</Line1>

我无法控制入站XML。我怎样才能安全/正确地解析它?

I'm not in control of in the inbound XML. How can I safely/correctly parse this?

推荐答案

很简单,输入XML不是有效的XML。该实体应编码,即:

Quite simply, the input "XML" is not valid XML. The entity should be encoded, i.e.:

<Line1>Day &amp; Night</Line1>

基本上,没有正确的方法来解决这个问题,除了告诉XML供应商他们是给你垃圾并让他们来解决它。如果你处于某种可怕的情况,你只需处理它,那么你所采取的方法可能取决于你期望得到的价值范围。

Basically, there's no "proper" way to fix this other than telling the XML supplier that they're giving you garbage and getting them to fix it. If you're in some horrible situation where you've just got to deal with it, then the approach you take will likely depend on what range of values you're expected to receive.

如果文档中根本没有实体,则使用& amp; & 的正则表达式>在处理之前就可以了。但是如果他们正确地发送了一些实体,你需要从匹配中排除这些实体。并且他们实际上想要发送实体代码的罕见机会(即发送& amp; 但意味着& amp; amp; )你将完全没有运气。

If there's no entities in the document at all, a regex replace of & with &amp; before processing would do the trick. But if they're sending some entities correctly, you'd need to exclude these from the matching. And on the rare chance that they actually wanted to send the entity code (i.e. sent &amp; but meant &amp;amp;) you're going to be completely out of luck.

但是嘿 - 无论如何这都是供应商的错误,如果你试图修复无效输入并不是他们想要的,他们可以做一件简单的事情来解决这个问题。 : - )

But hey - it's the supplier's fault anyway, and if your attempt to fix up invalid input isn't exactly what they wanted, there's a simple thing they can do to address that. :-)

这篇关于'&amp;'的XML解析问题在元素文本中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆