在php中解析极大的XML文件 [英] Parsing extremely large XML files in php

查看:82
本文介绍了在php中解析极大的XML文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析40GB的XML文件,然后进行规范化,然后插入到MySQL数据库中.我不清楚需要在数据库中存储多少文件,我也不知道XML结构.

I need to parse XML files of 40GB in size, and then normalize, and insert to a MySQL database. How much of the file I need to store in the database is not clear, neither do I know the XML structure.

我应该使用哪个解析器,您将如何进行呢?

Which parser should I use, and how would you go about doing this?

推荐答案

在PHP中,您可以使用

In PHP, you can read in extreme large XML files with the XMLReaderDocs:

$reader = new XMLReader();
$reader->open($xmlfile);

极端的XML文件应以压缩格式存储在磁盘上.至少这是有道理的,因为XML文件具有很高的压缩率.例如,像large.xml.gz一样gzip压缩.

Extreme large XML files should be stored in a compressed format on disk. At least this makes sense as XML files have a high compression ratio. For example gzipped like large.xml.gz.

PHP通过压缩包装器 文档 :

$xmlfile = 'compress.zlib://path/to/large.xml.gz';

$reader = new XMLReader();
$reader->open($xmlfile);

XMLReader允许您对当前元素"only"进行操作.这意味着它是仅前进的.如果需要保持解析器状态,则需要自己构建它.

The XMLReader allows you to operate on the current element "only". That means it's forward-only. If you need to keep parser state, you need to build it your own.

我经常发现将基本动作包装到一组迭代器中很有帮助,这些迭代器知道如何在XMLReader上进行操作,例如仅遍历元素或子元素.您可以在使用PHP和XMLReader解析XML 中找到概述.

I often find it helpful to wrap the basic movements into a set of iterators that know how to operate on XMLReader like iterating through elements or child-elements only. You find this outlined in Parse XML with PHP and XMLReader.

另请参见:

这篇关于在php中解析极大的XML文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆