Java中的XSLT转换极其缓慢 [英] Extremely slow XSLT transformation in Java

查看:111
本文介绍了Java中的XSLT转换极其缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试使用XSLT转换XML文档。作为输入,我有www.wordpress.org XHTML源代码,XSLT是虚拟示例检索网站的标题(实际上它什么都不做 - 它不会改变任何东西)。

I try to transform XML document using XSLT. As an input I have www.wordpress.org XHTML source code, and XSLT is dummy example retrieving site's title (actually it could do nothing - it doesn't change anything).

我使用的每个API或库,转换大约需要2分钟!如果你看看wordpress.org源码,你会发现它只有183行代码。正如我用Google搜索,这可能是由于DOM树的构建。无论XSLT多么简单,它总是2分钟 - 所以它确认它与DOM构建有关,但无论如何我不应该花2分钟。

Every single API or library I use, transformation takes about 2 minutes! If you take a look at wordpress.org source, you will notice that it is only 183 lines of code. As I googled it is probably due to DOM tree building. No matter how simple XSLT is, it is always 2 minutes - so it confirms idea that it's related to DOM building, but anyway it should not take 2 minutes in my opinion.

这是一个示例代码(没什么特别的):

Here is an example code (nothing special):

  TransformerFactory tFactory = TransformerFactory.newInstance();
   Transformer transformer = null;

   try {
       transformer = tFactory.newTransformer(
           new StreamSource("/home/pd/XSLT/transf.xslt"));

   } catch (TransformerConfigurationException e) {
       e.printStackTrace();
   }

   ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

   System.out.println("START");
   try {
       transformer.transform(new SAXSource(new InputSource(
           new FileInputStream("/home/pd/XSLT/wordpress.xml"))),
           new StreamResult(outputStream));
   } catch (TransformerException e) {       
       e.printStackTrace();
   } catch (IOException e) {
       e.printStackTrace();
   }
   System.out.println("STOP");

   System.out.println(new String(outputStream.toByteArray()));

它位于START和STOP之间,其中java暂停2分钟。如果我查看处理器或内存使用情况,则不会增加任何内容。它看起来真的是JVM停止了......

It's between START and STOP where java "pauses" for 2 minutes. If I take a look at the processor or memory usage, nothing increases. It looks like really JVM stopped...

你是否有过转换超过50(这是随机数;))行的XML的经验?当我读到XSLT时,总是需要构建DOM树才能完成它的工作。快速转换对我来说至关重要。

Do you have any experience in transforming XMLs that are longer than 50 (this is random number ;)) lines? As I read XSLT always needs to build DOM tree in order to do its work. Fast transformation is crucial for me.

提前致谢,
Piotr

Thanks in advance, Piotr

推荐答案

示例HTML文件是否使用命名空间?如果是这样,您的XML解析器可能正在尝试从命名空间URI中检索内容(可能是模式)。如果每次运行只需要两分钟 - 这可能是一次或多次TCP超时。

Does the sample HTML file use namespaces? If so, your XML parser may be attempting to retrieve contents (a schema, perhaps) from the namespace URIs. This is likely if each run takes exactly two minutes -- it's likely one or more TCP timeouts.

您可以通过计算实例化<$的时间来验证这一点。 c $ c> InputSource object(实际解析WordPress XML的地方),因为这可能是导致延迟的行。在查看您发布的示例文件后,它确实包含一个声明的命名空间( xmlns =http://www.w3.org/1999/xhtml)。

You can verify this by timing how long it takes to instantiate your InputSource object (where the WordPress XML is actually parsed), as this is likely the line which is causing the delay. After reviewing the sample file you posted, it does include a declared namespace (xmlns="http://www.w3.org/1999/xhtml").

要解决此问题,您可以实现自己的 EntityResolver 基本上禁用基于URL的解析。您可能需要使用DOM - 请参阅 DocumentBuilder setEntityResolver method。

To work around this, you can implement your own EntityResolver which essentially disables the URL-based resolution. You may need to use a DOM -- see DocumentBuilder's setEntityResolver method.

这是一个使用DOM和禁用分辨率的示例(注意 - 这是未经测试的):

Here's a sample using DOM and disabling resolution (note -- this is untested):

try {
    DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder db = dbFactory.newDocumentBuilder();
    db.setEntityResolver(new EntityResolver() {

        @Override
        public InputSource resolveEntity(String publicId, String systemId) throws SAXException, IOException {
            return null; // Never resolve any IDs
        }
    });

    System.out.println("BUILDING DOM");

    Document doc = db.parse(new FileInputStream("/home/pd/XSLT/wordpress.xml"));

    ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

    TransformerFactory tFactory = TransformerFactory.newInstance();
    Transformer transformer = tFactory.newTransformer(
        new StreamSource("/home/pd/XSLT/transf.xslt"));

    System.out.println("RUNNING TRANSFORM");

    transformer.transform(
            new DOMSource(doc.getDocumentElement()),
            new StreamResult(outputStream));

    System.out.println("TRANSFORMED CONTENTS BELOW");
    System.out.println(outputStream.toString());
} catch (Exception e) {
    e.printStackTrace();
}

如果你想使用SAX,你必须使用 SAXSource ,带有 XMLReader 使用您的自定义解析器。

If you want to use SAX, you would have to use a SAXSource with an XMLReader which uses your custom resolver.

这篇关于Java中的XSLT转换极其缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆