在JDOM / DOM中禁用XML实体解析 [英] Disable XML Entity resolving in JDOM / DOM

查看:825
本文介绍了在JDOM / DOM中禁用XML实体解析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个用于XML文件后处理的Java应用程序。
这些xml文件来自Semantic Mediawiki的RDF-Export,因此它们具有rdf / xml语法。

I am writing a Java application for the postprocessing of XML files. These xml files come from an RDF-Export of a Semantic Mediawiki, so they have rdf/xml syntax.

我的问题如下:
当我读取xml文件时,文件中的所有实体都会被解析为其在Doctype中指定的值。例如,在Doctype我有

My problem is the following: When I read the xml file, all the entities in the file get resolved to their value which is specified in the Doctype. For example in the Doctype I have

<!DOCTYPE rdf:RDF[
<!ENTITY wiki 'http://example.org/smartgrid/index.php/Special:URIResolver/'>
..
]>

和根元素

<rdf:RDF
xmlns:wiki="&wiki;"
..
>

这意味着

<swivt:Subject rdf:about="&wiki;Main_Page">

变为

<swivt:Subject rdf:about="http://example.org/smartgrid/index.php/Special:URIResolver/Main_Page">

我尝试过使用JDOM和标准Java DOM。
我认为这里的代码与标准DOM相关:

I have tried using JDOM and the standard Java DOM. The code I think is relevant here is for standard DOM:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setExpandEntityReferences(false);
        factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

和JDOM

SAXBuilder builder = new SAXBuilder();
    builder.setExpandEntities(false); //Retain Entities
    builder.setValidation(false);
    builder.setFeature("http://xml.org/sax/features/resolve-dtd-uris", false);

但实体在整个xml文档中得到了解决。
我错过了什么吗?几小时的搜索只引导我进入'ExpandEntities'命令,但它们似乎不起作用。

But the Entities are resolved throughout the whole xml document none the less. Am I missing something? Hours of search has only led me to the 'ExpandEntities' commands, but they don't seem to work.

任何提示都受到高度赞赏:)

Any hint is highly appreciated :)

推荐答案

我推荐JDOM FAQ:

I recommend the JDOM FAQ:

[ http://www.jdom.org/docs/faq.html#a0350]

如何防止加载DTD?即使我关闭验证,解析器也会尝试加载DTD文件。

即使关闭验证,XML解析器也会默认加载外部DTD文件,以便为外部实体声明解析DTD。 Xerces有一项功能可以关闭名为 http://apache.org/的行为xml / features / nonvalidating / load-external-dtd 如果你知道你正在使用Xerces,你可以在构建器上设置这个功能。

Even when validation is turned off, an XML parser will by default load the external DTD file in order to parse the DTD for external entity declarations. Xerces has a feature to turn off this behavior named "http://apache.org/xml/features/nonvalidating/load-external-dtd" and if you know you're using Xerces you can set this feature on the builder.

builder.setFeature(
  "http://apache.org/xml/features/nonvalidating/load-external-dtd", false);

如果你正在使用像Crimson这样的另一个解析器,你最好的办法就是设置一个可以解决的EntityResolver没有实际读取单独文件的DTD。

If you're using another parser like Crimson, your best bet is to set up an EntityResolver that resolves the DTD without actually reading the separate file.

import org.xml.sax.*;
import java.io.*;

public class NoOpEntityResolver implements EntityResolver {
  public InputSource resolveEntity(String publicId, String systemId) {
    return new InputSource(new StringBufferInputStream(""));
  }
}

然后在构建器中......

Then in the builder...

builder.setEntityResolver(new NoOpEntityResolver());

此方法存在缺点。文档中的任何实体都将被解析为空字符串,并且将有效消失。如果您的文档包含实体,则需要设置ExpandEntities(false)代码并确保EntityResolver仅抑制DocType。

There is a downside to this approach. Any entities in the document will be resolved to the empty string, and will effectively disappear. If your document has entities, you need to setExpandEntities(false) code and ensure the EntityResolver only suppresses the DocType.

这篇关于在JDOM / DOM中禁用XML实体解析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆