提高 PHP DOM-XML 的性能.目前需要很长时间 [英] Increase performance of PHP DOM-XML. Currently takes too long time

查看:19
本文介绍了提高 PHP DOM-XML 的性能.目前需要很长时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含 7000+ 个值的数组

I have an array which contains 7000+ value

$arrayIds = [
    'A001',
    ...,
    'A7500'
];

这个 foreach 循环获取给定 XML 文件中节点内的节点值

This foreach loop gets node value inside a node in a given XML file

$dom = new DOMDocument;
$dom->load('myxml.xml');

$xp = new DOMXPath($dom);

$data = [];

foreach ($arrayIds as $arrayId) {
    $expression = "//unit[@person-name=\"$arrayId\"]/@id";
    $col = $xp->query($expression);

    if ($col && $col->length) {
        foreach ($col as $node) {
            $data[] = $node->nodeValue;
        }
    }
}

大约需要 70 秒.我不能等待超过 5 秒

It takes approximately 70 seconds. I can't wait any longer than 5 seconds

实现这一目标的最快方法是什么?

What is the fastest way to achieve this?

XML 文件的片段:

<unit person-name="A695" id="PTU-300" xml:space="preserve">
    <source xml:lang="en">Related tutorials</source>
    <seg-source><mrk mid="0" mtype="seg">Related tutorials</mrk></seg-source>
    <target xml:lang="id"><mrk mid="0" mtype="seg">Related tutorials</mrk></target>
</unit>
<unit person-name="A001" id="PTU-4" xml:space="preserve">
    <source xml:lang="en">Related tutorials</source>
    <seg-source><mrk mid="0" mtype="seg">Related tutorials</mrk></seg-source>
    <target xml:lang="id"><mrk mid="0" mtype="seg">Related tutorials</mrk></target>
</unit>
...
<unit>
...
</unit>

无论如何,我是在 M1 Mac 上做的

Anyway, I'm doing this on an M1 Mac

推荐答案

我认为问题在于您使用 XPath 查找元素的方式.每次为每个名称运行它时,它都会搜索整个文档,即使它是第一项.这是因为它可以找到多个值,并且在找到第一个后不知道停止.

I think the problem is the way you use XPath to find an element. Each time you run it for each name, it will search the whole document, even if it's the first item. This is because it could find multiple values and doesn't know to stop after finding the first.

或者,它使用 XPath 查找所有名称并检查每个名称是否在您要查找的名称列表中.如果是,它会提取 id 并将其添加到列表中.

Alternatively, this uses XPath to find all of the names and checks each one if it is in the list of names you are looking for. If so, it extracts the id and adds it to the list.

很难测试这需要多长时间,但你比我更容易测试......

It's difficult to test how long this will take, but it's easier for you to test than me...

$data = array_fill_keys($arrayIds, null);
$arrayIds = array_flip($arrayIds);
$expression = "//unit/@person-name";
$cols = $xp->query($expression);
foreach ($cols as $col) {
    if (isset($arrayIds[$col->nodeValue])) {
        $parent = $col->parentNode;
        $data[$col->nodeValue] =$parent->attributes->getNamedItem("id")->nodeValue;
    }
}

使用 array_flip() 将要搜索的名称作为索引,因此可以使用 isset() 而不是搜索数组.

Using array_flip() makes the names to search for as the index, so isset() can be used rather than doing a search of the array.

我已将名称添加为输出数组的键,因此您会得到类似...

I've added the name as the key to the output array, so you get something like...

Array
(
    [A001] => 4
)

这篇关于提高 PHP DOM-XML 的性能.目前需要很长时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆