使用SimpleXMLElement的大型PHP for循环非常慢:内存问题? [英] Large PHP for loop with SimpleXMLElement very slow: memory issues?

查看:55
本文介绍了使用SimpleXMLElement的大型PHP for循环非常慢:内存问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前有一些PHP代码,这些代码基本上是从xml文件中提取数据并使用$products = new SimpleXMLElement($xmlString);创建简单的xml对象,然后使用for循环遍历此代码,在其中为每个产品设置产品详细信息XML文档.然后将其保存到mySql数据库中.

I currently have a bit of PHP code that basically pulls in data from an xml file and creates simple xml object using $products = new SimpleXMLElement($xmlString); I then loop over this code with a for loop within which I set the product details for each product in the XML document. Then it is saved to a mySql database.

在运行此脚本时,添加的产品的频率降低,直到最终停止达到最大值为止.我尝试过定期运行垃圾收集,但无济于事.以及设置各种似乎无效的变量.

Whilst running this script the products added reduce in frequency until they eventually stop before reaching the maximum. I have tried running garbage collection in intervals, to no avail. As well as unsetting various variables which doesn't seem to work .

部分代码如下所示:

<?php
$servername = "localhost";
$username = "database.database";
$password = "demwke";
$database = "databasename";
$conn = new mysqli($servername, $username, $password, $database);

$file = "large.xml";
$xmlString = file_get_contents($file);
$products = new SimpleXMLElement($xmlString);
unset($xmlString, $file);
$total = count($products->datafeed[0]);

echo 'Starting<br><br>';

for($i=0;$i<$total;$i++){
    $id = $products->datafeed->prod[$i]['id'];
etc etc
    $sql = "INSERT INTO products (id, name, uid, cat, prodName, brand, desc, link, imgurl, price, subcat) VALUES ('$id', '$store', '$storeuid', '$category', '$prodName', '$brand', '$prodDesc', '$link', '$image', '$price', '$subCategory')";
}
echo '<br>Finished';
?>

php变量均使用与$ id相似的行定义,但已删除以便于阅读.

The php variables are all defined using a similar line as with $id but removed to make easier reading.

关于我可以做什么/阅读以完成此工作的任何想法?只要最终完成,花费的时间对我来说并不重要.

Any ideas on what I can do/read to get this completing? The time taken doesn't really matter to me as long as it eventually completes.

推荐答案

更新:请勿将索引与SimpleXML一起使用,除非您只有很少的个对象.改为使用foreach.:

Update: never use indexes with SimpleXML unless you have really few objects. Use foreach instead.:

// Before, with [index]:
for ($i=0;$i<$total;$i++) {
    $id = $products->datafeed->prod[$i]['id'];
    ...

// After, with foreach():
$i = 0;
foreach ($products->datafeed->prod as $prod) {
    $i++; // Remove if you don't actually need $i
    $id = $prod['id'];
    ...

通常,...->node[$i]将访问数组node[]并将其全部读取到所需的索引,以便迭代节点数组不是o(N),而是 o(N 2 ).没有解决方法,因为不能保证当您访问项目K时,您只是访问了项目K-1(以此类推). foreach保存指针,因此可以在o(N)中工作.

In general, ...->node[$i] will access the array node[] and read it all up to the desired index, so that iterating the node array is not o(N), but o(N2). There is no workaround, because there is no guarantee that when you access item K, you've just accessed item K-1 (and so on recursively). foreach saves the pointer and thus works in o(N).

出于相同的原因,即使您确实只需要很少的已知项(除非它们很少且非常接近数组的开头),遍历整个数组可能也是有利的:

For the same reason, it might be advantageous to iterate with foreach the whole array even if you really need only few, known items (unless they're few and very near the beginning of the array):

    $a[0] = $products->datafeed->prod[15]['id'];
    ...
    $a[35] = $products->datafeed->prod[1293]['id'];

// After, with foreach():
$want = [ 15, ... 1293 ];
$i = 0;
foreach ($products->datafeed->prod as $prod) {
    if (!in_array(++$i, $want)) {
        continue;
    }
    $a[] = $prod['id'];
}


您应该首先验证增加的延迟是由MySQLi还是由XML处理引起的.您可以删除(注释掉)SQL查询执行,而从循环中删除其他任何内容,以验证速度(允许的速度现在会更高... :-)保持不变还是显示相同的下降.


You should first verify whether the increasing delay is caused by MySQLi or by XML processing. You can remove (comment out) the SQL query execution, and nothing else, from the cycle, to verify whether the speed (granted it will now be much higher... :-) ) remains now constant, or shows the same decrease.

我怀疑XML处理是罪魁祸首,在这里:

I suspect that the XML processing is the culprit, in here:

for($i=0;$i<$total;$i++){
    $id = $products->datafeed->prod[$i]['id'];

...在其中您可以访问到一个SimpleXMLObject 越来越远的索引的位置.这可能会遇到画家塞勒梅尔的问题.

...where you access an index which is farther and farther into a SimpleXMLObject. This might suffer from the problem of Schlemiel the Painter.

对于您的问题无论如何都如何完成循环"的直接回答是增加内存限制和最大执行时间".

The straight answer to your question, "how do I get the loop to complete, no matter the time", is "increase memory limit and max execution time".

要提高性能,您可以在feed对象中使用不同的接口:

To improve performances, you can use a different interface into the feed object:

$i = -1;
foreach ($products->datafeed->prod as $prod) {
    $i++;
    $id = $prod['id'];
    ...
}

实验

我使用这个小程序读取一个大型XML并对其内容进行迭代:

Experimenting

I use this small program to read a large XML and iterate its content:

// Stage 1. Create a large XML.
$xmlString = '<?xml version="1.0" encoding="UTF-8" ?>';
$xmlString .= '<content><package>';
for ($i = 0; $i < 100000; $i++) {
    $xmlString .=  "<entry><id>{$i}</id><text>The quick brown fox did what you would expect</text></entry>";
}
$xmlString .= '</package></content>';

// Stage 2. Load the XML.
$xml    = new SimpleXMLElement($xmlString);

$tick   = microtime(true);
for ($i = 0; $i < 100000; $i++) {
    $id = $xml->package->entry[$i]->id;
    if (0 === ($id % 5000)) {
        $t = microtime(true) - $tick;
        print date("H:i:s") . " id = {$id} at {$t}\n";
        $tick = microtime(true);
    }
}

生成XML之后,一个循环对其进行解析,并打印迭代5000个元素需要花费多少.为了验证它确实是时间增量,还会打印日期.差异应该大约是两个时间戳之间的时间差.

After generating the XML, a cycle parses it and prints how much does it take to iterate 5000 elements. To verify it is indeed the time delta, the date is also printed. The delta should be approximately the difference in time between the timestamps.

21:22:35 id = 0 at 2.7894973754883E-5
21:22:35 id = 5000 at 0.38135695457458
21:22:38 id = 10000 at 2.9452259540558
21:22:44 id = 15000 at 5.7002019882202
21:22:52 id = 20000 at 8.0867099761963
21:23:02 id = 25000 at 10.477082967758
21:23:15 id = 30000 at 12.81209897995
21:23:30 id = 35000 at 15.120756149292

这就是发生的事情:处理XML数组的速度越来越慢.

这与使用foreach的程序基本相同:

This is mostly the same program using foreach:

// Stage 1. Create a large XML.
$xmlString = '<?xml version="1.0" encoding="UTF-8" ?>';
$xmlString .= '<content><package>';
for ($i = 0; $i < 100000; $i++) {
    $xmlString .=  "<entry><id>{$i}</id><text>The quick brown fox did ENTRY {$i}.</text></entry>";
}
$xmlString .= '</package></content>';

// Stage 2. Load the XML.
$xml    = new SimpleXMLElement($xmlString);

$i      = 0;
$tick   = microtime(true);
foreach ($xml->package->entry as $data) {
    // $id = $xml->package->entry[$i]->id;
    $id = $data->id;
    $i++;
    if (0 === ($id % 5000)) {
        $t = microtime(true) - $tick;
        print date("H:i:s") . " id = {$id} at {$t} ({$data->text})\n";
        $tick = microtime(true);
    }
}

现在的时间似乎是恒定的……我说似乎"是因为它们似乎减少了大约一万倍,而且在获得可靠的测量值方面有些困难.

The times seem to be constant now... I say "seem" because they appear to have decreased by a factor of about ten thousand, and I have some difficulties in getting reliable measurements.

(不,我不知道.我可能从未在大型XML数组中使用索引).

(And no, I had no idea. I probably never used indexes with large XML arrays).

21:33:42 id = 0 at 3.0994415283203E-5 (The quick brown fox did ENTRY 0.)
21:33:42 id = 5000 at 0.0065329074859619 (The quick brown fox did ENTRY 5000.)
...
21:33:42 id = 95000 at 0.0065121650695801 (The quick brown fox did ENTRY 95000.)

这篇关于使用SimpleXMLElement的大型PHP for循环非常慢:内存问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆