Java XPath(Apache JAXP 实现)性能 [英] Java XPath (Apache JAXP implementation) performance

查看:26
本文介绍了Java XPath(Apache JAXP 实现)性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

<块引用>

注意:如果您也遇到此问题,请在 Apache JIRA 上点赞:

NOTE: If you experience this issue as well, please upvote it on Apache JIRA:

https://issues.apache.org/jira/browse/XALANJ-2540

I have come to an astonishing conclusion that this:

Element e = (Element) document.getElementsByTagName("SomeElementName").item(0);
String result = ((Element) e).getTextContent();

Seems to be an incredible 100x faster than this:

// Accounts for 30%, can be cached
XPathFactory factory = XPathFactory.newInstance();

// Negligible
XPath xpath = factory.newXPath();

// Negligible
XPathExpression expression = xpath.compile("//SomeElementName");

// Accounts for 70%
String result = (String) expression.evaluate(document, XPathConstants.STRING);

I'm using the JVM's default implementation of JAXP:

org.apache.xpath.jaxp.XPathFactoryImpl
org.apache.xpath.jaxp.XPathImpl

I'm really confused, because it's easy to see how JAXP could optimise the above XPath query to actually execute a simple getElementsByTagName() instead. But it doesn't seem to do that. This problem is limited to around 5-6 frequently used XPath calls, that are abstracted and hidden by an API. Those queries involve simple paths (e.g. /a/b/c, no variables, conditions) against an always available DOM Document only. So, if an optimisation can be done, it will be quite easy to achieve.

My question: Is XPath's slowness an accepted fact, or am I overlooking something? Is there a better (faster) implementation? Or should I just avoid XPath altogether, for simple queries?

解决方案

I have debugged and profiled my test-case and Xalan/JAXP in general. I managed to identify the big major problem in

org.apache.xml.dtm.ObjectFactory.lookUpFactoryClassName()

It can be seen that every one of the 10k test XPath evaluations led to the classloader trying to lookup the DTMManager instance in some sort of default configuration. This configuration is not loaded into memory but accessed every time. Furthermore, this access seems to be protected by a lock on the ObjectFactory.class itself. When the access fails (by default), then the configuration is loaded from the xalan.jar file's

META-INF/service/org.apache.xml.dtm.DTMManager

configuration file. Every time!:

Fortunately, this behaviour can be overridden by specifying a JVM parameter like this:

-Dorg.apache.xml.dtm.DTMManager=
  org.apache.xml.dtm.ref.DTMManagerDefault

or

-Dcom.sun.org.apache.xml.internal.dtm.DTMManager=
  com.sun.org.apache.xml.internal.dtm.ref.DTMManagerDefault

The above works, as this will allow to bypass the expensive work in lookUpFactoryClassName() if the factory class name is the default anyway:

// Code from com.sun.org.apache.xml.internal.dtm.ObjectFactory
static String lookUpFactoryClassName(String factoryId,
                                     String propertiesFilename,
                                     String fallbackClassName) {
  SecuritySupport ss = SecuritySupport.getInstance();

  try {
    String systemProp = ss.getSystemProperty(factoryId);
    if (systemProp != null) { 

      // Return early from the method
      return systemProp;
    }
  } catch (SecurityException se) {
  }

  // [...] "Heavy" operations later

So here's a performance improvement overview for 10k consecutive XPath evaluations of //SomeNodeName against a 90k XML file (measured with System.nanoTime():

measured library        : Xalan 2.7.0 | Xalan 2.7.1 | Saxon-HE 9.3 | jaxen 1.1.3
--------------------------------------------------------------------------------
without optimisation    :     10400ms |      4717ms |              |     25500ms
reusing XPathFactory    :      5995ms |      2829ms |              |
reusing XPath           :      5900ms |      2890ms |              |
reusing XPathExpression :      5800ms |      2915ms |      16000ms |     25000ms
adding the JVM param    :      1163ms |       761ms |        n/a   |

note that the benchmark was a very primitive one. it may well be that your own benchmark will show that saxon outperforms xalan

I have filed this as a bug to the Xalan guys at Apache:

https://issues.apache.org/jira/browse/XALANJ-2540

这篇关于Java XPath(Apache JAXP 实现)性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆