XPath.evaluate性能降低(荒谬),在多个电话 [英] XPath.evaluate performance slows down (absurdly) over multiple calls

查看:195
本文介绍了XPath.evaluate性能降低(荒谬),在多个电话的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用的javax.xml.xpath包上有多个命名空间的文件运行前的XPath pressions,和我有愚蠢的性能问题。

I am trying to use the javax.xml.xpath package to run XPath expressions on a document with multiple namespaces, and I'm having goofy performance problems.

我的测试文档是从一个真实的,生产的例子拉。它是关于XML的600K。该文件是一个相当复杂的Atom feed。

My test document is pulled from a real, production example. It is about 600k of xml. The document is a fairly complex Atom feed.

我知道我在做什么和XPath可能没有完成。但是,相同的实现上的其他,远比不上平台进行荒谬更好。现在,重建我的系统无法使用XPath超出了我可以在时间,我有做的范围。

I realize that what I'm doing with XPath could be done without. However, the same implementation on other, vastly inferior platforms performs absurdly better. Right now, rebuilding my system to not use XPath is beyond the scope of what I can do in the time that I have.

我的测试code是这样的:

My test code is something like this:



void testXPathPerformance()
{
    DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
    factory.setNamespaceAware(true);
    DocumentBuilder builder = factory.newDocumentBuilder();

    Document doc = builder.parse(loadTestDocument());

    XPathFactory xpf = XPathFactory.newInstance();
    XPath xp = xpf.newXPath();

    NamespaceContext names = loadTestNamespaces();
    //there are 12 namespaces in names.  In this example code, I'm using
    //'samplens' instead of the actual namespaces that my application uses
    //for simplicity.  In my real code, the queries are different text, but
    //precisely the same complexity.

    xp.setNamespaceContext(names);

    NodeList nodes = (NodeList) xp.evaluate("/atom:feed/atom:entry",
                     doc.getDocumentElement(), XPathConstants.NODESET);


    for(int i=0;i<nodes.getLength();i++)
    {
        printTimestamp(1);
        xp.evaluate("atom:id/text()", nodes.item(i));
        printTimestamp(2);
        xp.evaluate("samplens:fieldA/text()", nodes.item(i));
        printTimestamp(3);
        xp.evaluate("atom:author/atom:uri/text()", nodes.item(i));
        printTimestamp(4);
        xp.evaluate("samplens:fieldA/samplens:fieldB/&at;attrC", nodes.item(i));
        printTimestamp(5);

        //etc.  My real example has 10 of these xp.evaluate lines

     }
}

当我上运行的Nexus One,(而不是在调试器,但与USB连接),第一次循环,每个xp.evaluate需要从10毫秒为20ms的地方。通过在循环15次,每次xp.evaluate需要的地方,从200毫秒到300毫秒。通过循环结束(有150个项目在节点),大约需要500毫秒,600毫秒为每个xp.evaluate。

When I run on a Nexus One, (not in the debugger, but with USB connected), the first time through the loop, each xp.evaluate takes somewhere from 10ms to 20ms. By the 15th time through the loop, each xp.evaluate takes somewhere from 200ms to 300ms. By the end of the loop (there are 150 items in nodes), it takes about 500ms-600ms for each xp.evaluate.

我用xp.compile尝试()。该编译所有需要&LT; 5ms的。我已经做了xp.reset()(没有区别)。我做的每一个评估新的XPath对象(增加约4毫秒)。

I've tried using xp.compile(). The compiles all take <5ms. I've done xp.reset() (makes no difference). I've done a new XPath object for each evaluate (adds about 4ms).

内存使用量不会出现执行过程中失控。

Memory usage does not appear to spiral out of control during execution.

我在JUnit测试情况下,单个线程不会创建活动或任何运行此。

I'm running this on a single thread in a JUnit test case that doesn't create an activity or anything.

我真的很疑惑。

没有任何人有任何想法什么尝试?

Does anybody have any idea what else to try?

谢谢!

更新

如果我运行的循环向后(的for(int i = nodes.getLength() - 1; I> = 0; I - )),那么前几个节点乘坐500毫秒,600毫秒,而最后的走快10毫秒,20毫秒。因此,这似乎是它无关的呼叫数量,而是说前pressions,其背景是在文档的最后时间比前pressions,其背景是在文档的开头较长

If I run the for loop backwards (for(int i=nodes.getLength()-1;i>=0;i--)), then the first few nodes take the 500ms-600ms, and the last ones go fast 10ms-20ms. So, this seems like it has nothing to do with the number of calls, but instead that expressions whose context is near the end of the document take longer than expressions whose context is near the beginning of the document.

没有任何人有什么我可以做这方面有什么想法?

Does anybody have any thoughts on what I can do about this?

推荐答案

尝试添加该code循环内上方;

Try adding this code inside the loop at the top;

Node singleNode = nodes.item(i);
singleNode.getParentNode().removeChild(singleNode);

然后运行使用 nodes.item(i)的 singleNode 变量,而不是每一个评价; (当然你更改名称)

then run each evaluation using the singleNode variable instead of nodes.item(i); (of course you change the name)

这样做分离你是从大型主文件正在使用的节点。这将加快评估方法的处理时间由一个巨大的数额。

Doing this detaches the node you are working with from the large main document. This will speed up the evaluate methods processing time by a huge amount.

EX:

for(int i=0;i<nodes.getLength();i++)
{
    Node singleNode = nodes.item(i);
    singleNode.getParentNode().removeChild(singleNode);

    printTimestamp(1);
    xp.evaluate("atom:id/text()", singleNode );
    printTimestamp(2);
    xp.evaluate("samplens:fieldA/text()", singleNode );
    printTimestamp(3);
    xp.evaluate("atom:author/atom:uri/text()", singleNode );
    printTimestamp(4);
    xp.evaluate("samplens:fieldA/samplens:fieldB/&at;attrC", singleNode );
    printTimestamp(5);

    //etc.  My real example has 10 of these xp.evaluate lines

 }

这篇关于XPath.evaluate性能降低(荒谬),在多个电话的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆