如何使用PHP顺序解析大型XML文件 [英] How to use PHP to parse large XML file sequentially

查看:82
本文介绍了如何使用PHP顺序解析大型XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用simpleXML在php中解析一个中等大小的XML文件(6mb).该脚本从XML文件中获取每个记录,检查是否已将其导入,如果尚未导入,则将记录中的更新/插入到我自己的数据库中.

问题是我不断收到关于超出内存分配的致命错误:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 256 bytes) in /.../system/database/drivers/mysql/mysql_result.php on line 162

我通过使用以下行来增加最大内存分配来避免该错误(以下提示来自解决方案

使用 XMLReader .

说您的文档是这样的:

<test>
   <hello>world</hello>
   <foo>bar</foo>
</test>

使用XMLReader:

$xml = new XMLReader;
$xml->open('doc.xml');

$xml->read();
while ($xml->read()) {
        if ($xml->nodeType == XMLReader::ELEMENT) {
                print $xml->name.': ';
        } else if ($xml->nodeType == XMLReader::TEXT) {
                print $xml->value.PHP_EOL;
        }
}

这将输出:

hello: world
foo: bar

令人高兴的是,您还可以使用expand将节点作为 DOMNode 对象获取. /p>

I'm trying to parse a moderately large XML file (6mb) in php using simpleXML. The script takes each record from the XML file, checks to see if it's already been imported, and, if it hasn't, updates/inserts that record into my own db.

The problem is I'm constantly getting a Fatal error about exceeding memory allocation:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 256 bytes) in /.../system/database/drivers/mysql/mysql_result.php on line 162

I avoided that error by using the following line to increase max memory allocation (following tip from here):

ini_set('memory_limit', '-1');

However, then I run up against the max execution time of 60 seconds, and, for whatever reason, my server (XAMPP on Mac OS X) won't let me increase that time (script simply won't run if I try to include a line like:)

set_time_limit(240);

This all seems very inefficient, however; shouldn't I be able to break the file up some how and process it sequentially? In the controller below I have a count variable ($cycle) to keep track of what record I'm on but I can't figure out how to implement it that it still doesn't have to process the whole XML file.

The controller (I'm using CodeIgniter) has this basic structure:

    $f = base_url().'data/data.xml';
    if($data = file_get_contents($f))
    {
        $cycle = 0;
        $xml = new SimpleXMLElement($data);
        foreach($xml->person as $p)
        {

        //this makes a single call to db for single field based on id of record in XML file                
        if($this->_notImported('source',$p['id']))
            {
               //various process here, mainly breaking up the data for inserting into four different bales
            }
            $cycle++;
        }
    }

Any thoughts?

Edited

To shed further light on what I'm doing, I'm grabbing most of the attributes of each element and subeelement and inserting them into my db. For example, using my old code, I have something like this:

$insert = array('indiv_name' => $p['fullname'],
                                    'indiv_first' => ($p['firstname']),
                                    'indiv_last' => ($p['lastname']),
                                    'indiv_middle' => ($p['middlename']),
                                    'indiv_other' => ($p['namemod']),
                                    'indiv_full_name' => $full_name,
                                    'indiv_title' => ($p['title']),
                                    'indiv_dob' => ($p['birthday']),
                                    'indiv_gender' => ($p['gender']),
                                    'indiv_religion' => ($p['religion']),
                                    'indiv_url' => ($url)
                                    );

With the suggestions of using XMLReader (see below), how could I accomplish parsing the attributes of both the main element and subelements?

解决方案

Use XMLReader.

Say your document is like this:

<test>
   <hello>world</hello>
   <foo>bar</foo>
</test>

With XMLReader:

$xml = new XMLReader;
$xml->open('doc.xml');

$xml->read();
while ($xml->read()) {
        if ($xml->nodeType == XMLReader::ELEMENT) {
                print $xml->name.': ';
        } else if ($xml->nodeType == XMLReader::TEXT) {
                print $xml->value.PHP_EOL;
        }
}

This outputs:

hello: world
foo: bar

The nice thing is that you can also use expand to fetch the node as a DOMNode object.

这篇关于如何使用PHP顺序解析大型XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆