Java的XPath的(Apache的JAXP实现)的性能 [英] Java XPath (Apache JAXP implementation) performance

查看:401
本文介绍了Java的XPath的(Apache的JAXP实现)的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

注意:如果您遇到此问题,则请给予好评它在Apache JIRA:

<一个href=\"https://issues.apache.org/jira/browse/XALANJ-2540\">https://issues.apache.org/jira/browse/XALANJ-2540

我得出一个惊人的结论,这:

I have come to an astonishing conclusion that this:

Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();

似乎是一个令人难以置信的速度快100倍比这:

Seems to be an incredible 100x faster than this:

// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();

// Negligible
XPath xpath = factory.newXPath();

// Negligible
XPathExpression expression = xpath.compile("//SomeElementName");

// Accounts for 70%
String result = (String) expression.evaluate(document, XPathConstants.STRING);

我使用JVM的JAXP的默认实现:

I'm using the JVM's default implementation of JAXP:

org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl

我真的很困惑,因为它很容易地看到JAXP如何优化上述XPath查询到实际执行一个简单的的getElementsByTagName()来代替。但它似乎并没有做到这一点。这个问题仅限于围绕5-6常用的XPath呼叫,是抽象和由API隐藏。这些查询涉及的简单路径(例如 / A / B / C ,没有变量,条件)只针对一个总是可用的DOM文档。所以,如果一个优化可以做到的,这将是相当容易实现的。

I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.

我的问题:是XPath的缓慢公认的事实,还是我忽视的东西?有没有更好的(快)执行?或者我应该避免完全的XPath,对于简单的查询?

My question: Is XPath's slowness an accepted fact, or am I overlooking something? Is there a better (faster) implementation? Or should I just avoid XPath altogether, for simple queries?

推荐答案

我已经调试和异形我的测试情况和一般的Xalan / JAXP。我设法找出大主要问题。

I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in

org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()

可以看出,10k的测试的XPath评价中的每一个导致类加载器试图查找在 DTMManager 实例在某种缺省配置。此配置不会加载到内存,但访问的每一次。此外,此访问似乎是在 ObjectFactory.class 本身就是一个锁保护。当访问失败(默认),那么配置从 xalan.jar 文件的加载

It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's

META-INF/service/org.apache.xml.dtm.DTMManager

配置文件。 每次

幸运的是,这种行为可以通过指定这样的JVM参数覆盖:

Fortunately, this behaviour can be overridden by specifying a JVM parameter like this:

-Dorg.apache.xml.dtm.DTMManager=
  org.apache.xml.dtm.ref.DTMManagerDefault

-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
  com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault

以上的作品,因为这将允许绕过 lookUpFactoryClassName昂贵的工作()如果工厂类名是默认反正:

The above works, as this will allow to bypass the expensive work in lookUpFactoryClassName() if the factory class name is the default anyway:

// Code from com.sun.org.apache.xml.internal.dtm.ObjectFactory
static String lookUpFactoryClassName(String factoryId,
                                     String propertiesFilename,
                                     String fallbackClassName) {
  SecuritySupport ss = SecuritySupport.getInstance();

  try {
    String systemProp = ss.getSystemProperty(factoryId);
    if (systemProp != null) { 

      // Return early from the method
      return systemProp;
    }
  } catch (SecurityException se) {
  }

  // [...] "Heavy" operations later

所以这里是为 // SomeNodeName 针对90K XML文件连续10K XPath计算的性能改进概述(与测量System.nanoTime()

So here's a performance improvement overview for 10k consecutive XPath evaluations of //SomeNodeName against a 90k XML file (measured with System.nanoTime():

measured library        : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation    :     10400ms |      4717ms |              |     25500ms
reusing XPathFactory    :      5995ms |      2829ms |              |
reusing XPath           :      5900ms |      2890ms |              |
reusing XPathExpression :      5800ms |      2915ms |      16000ms |     25000ms
adding the JVM param    :      1163ms |       761ms |        n/a   |

注意,基准是一个非常原始的一种。它很可能是你自己的基准将会显示撒克逊优于xalan的

我在Apache的提起这个bug到Xalan的家伙:

I have filed this as a bug to the Xalan guys at Apache:

<一个href=\"https://issues.apache.org/jira/browse/XALANJ-2540\">https://issues.apache.org/jira/browse/XALANJ-2540

这篇关于Java的XPath的(Apache的JAXP实现)的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆