'&'的XML解析问题在元素文本中 [英] XML parsing issue with '&' in element text
问题描述
我有以下代码:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.Document;
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(new InputSource(new StringReader(inputXml)));
并且抛出了解析步骤:
SAXParseException: The entity name must immediately follow
the '&' in the entity reference
由于我的 inputXml
中的以下'&':
due to the following '&' in my inputXml
:
<Line1>Day & Night</Line1>
我无法控制入站XML。我怎样才能安全/正确地解析它?
I'm not in control of in the inbound XML. How can I safely/correctly parse this?
推荐答案
很简单,输入XML不是有效的XML。该实体应编码,即:
Quite simply, the input "XML" is not valid XML. The entity should be encoded, i.e.:
<Line1>Day & Night</Line1>
基本上,没有正确的方法来解决这个问题,除了告诉XML供应商他们是给你垃圾并让他们来解决它。如果你处于某种可怕的情况,你只需处理它,那么你所采取的方法可能取决于你期望得到的价值范围。
Basically, there's no "proper" way to fix this other than telling the XML supplier that they're giving you garbage and getting them to fix it. If you're in some horrible situation where you've just got to deal with it, then the approach you take will likely depend on what range of values you're expected to receive.
如果文档中根本没有实体,则使用& amp;
& 的正则表达式>在处理之前就可以了。但是如果他们正确地发送了一些实体,你需要从匹配中排除这些实体。并且他们实际上想要发送实体代码的罕见机会(即发送& amp;
但意味着& amp; amp;
)你将完全没有运气。
If there's no entities in the document at all, a regex replace of &
with &
before processing would do the trick. But if they're sending some entities correctly, you'd need to exclude these from the matching. And on the rare chance that they actually wanted to send the entity code (i.e. sent &
but meant &amp;
) you're going to be completely out of luck.
但是嘿 - 无论如何这都是供应商的错误,如果你试图修复无效输入并不是他们想要的,他们可以做一件简单的事情来解决这个问题。 : - )
But hey - it's the supplier's fault anyway, and if your attempt to fix up invalid input isn't exactly what they wanted, there's a simple thing they can do to address that. :-)
这篇关于'&'的XML解析问题在元素文本中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!