是否有可以在不解析字符实体的情况下解析文档的 Java XML API? [英] Is there a Java XML API that can parse a document without resolving character entities?

查看:30
本文介绍了是否有可以在不解析字符实体的情况下解析文档的 Java XML API?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的程序需要解析包含字符实体的 XML.程序本身不需要解析它们,它们的列表很大并且会发生变化,所以我想尽可能避免对这些实体的显式支持.

I have program that needs to parse XML that contains character entities. The program itself doesn't need to have them resolved, and the list of them is large and will change, so I want to avoid explicit support for these entities if I can.

这是一个简单的例子:

<?xml version="1.0" encoding="UTF-8"?>
<xml>Hello there &something;</xml>

是否有 Java XML API 可以在不解析(非标准)字符实体的情况下成功解析文档?理想情况下,它会将它们转换为可以特殊处理的特殊事件或对象,但我会选择一种可以静默抑制它们的选项.

Is there a Java XML API that can parse a document successfully without resolving (non-standard) character entities? Ideally it would translate them into a special event or object that could be handled specially, but I'd settle for an option that would silently suppress them.

回答&示例:

Skaffman 给了我答案:使用 StAX 解析器,并将 IS_REPLACING_ENTITY_REFERENCES 设置为 false.

Skaffman gave me the answer: use a StAX parser with IS_REPLACING_ENTITY_REFERENCES set to false.

这是我尝试使用的代码:

Here's the code I whipped up to try it out:

XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_REPLACING_ENTITY_REFERENCES, false);
XMLEventReader reader = inputFactory.createXMLEventReader(
    new FileInputStream("your file here"));

while (reader.hasNext()) {
    XMLEvent event = reader.nextEvent();
    if (event.isEntityReference()) {
        EntityReference ref = (EntityReference) event;
        System.out.println("Entity Reference: " + ref.getName());
    }
}

对于上面的 XML,它会打印Entity Reference: something".

For the above XML, it will print "Entity Reference: something".

推荐答案

STaX API 支持不替换字符实体引用的概念,通过 IS_REPLACING_ENTITY_REFERENCES 属性:

The STaX API has support for the notion of not replacing character entity references, by way of the IS_REPLACING_ENTITY_REFERENCES property:

需要解析器替换内部实体引用及其替换文本并将它们报告为字符

Requires the parser to replace internal entity references with their replacement text and report them as characters

这可以设置到 XmlInputFactory 中,然后依次用于构造 XmlEventReaderXmlStreamReader.但是,API 谨慎地说,此属性仅用于强制 实现执行替换,而不是强制它 替换它们.不过还是值得一试的.

This can be set into an XmlInputFactory, which is then in turn used to construct an XmlEventReader or XmlStreamReader. However, the API is careful to say that this property is only intended to force the implementation to perform the replacement, rather than forcing it to not replace them. Still, it's got to be worth a try.

这篇关于是否有可以在不解析字符实体的情况下解析文档的 Java XML API?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆