循环一个大的 XML 文件 [英] Looping over a large XML file

查看:20
本文介绍了循环一个大的 XML 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在循环一个大约 20-30 MB(650000 行)的 XML 文件时遇到问题.

I'm having problems looping over an XML file about 20-30 MB (650000 rows).

这是我的元代码:

<cffile action="READ" ile="file.xml" variable="usersRaw">

<cfset usersXML = XmlParse(usersRaw)>
<cfset advsXML = XmlSearch(usersXML, "/advs/advuser")>
<cfset users = XmlSearch(usersXML, "/advs/advuser/user")>

<cfset numUsers = ArrayLen(users)>
<cfloop index="i" from="1" to="#numUsers#">
    ... some selects...
    ... insert...
    <cfset advs = annunciXml[i]["vehicle"]>
    <cfset numAdvs = ArrayLen(advs)> 
    <cfloop index="k" from="1" to="#numAdvs#">        
        ... insert... or ... update...
    </cfloop>
</cfloop>

xml 文件的结构是(是的,不是很好:-)

struct of xml file is (yes, is not very good :-)

<advs>
   <advuser>
      <user>
      </user>
      <vehicle>
      <vehicle>
   </advuser>
</advs>

大约 120,000 行后,我收到一个错误:内存不足".

After ~120,000 rows I get an error: "Out of memory".

如何提高脚本的性能?

如何诊断最大内存消耗在哪里?

How can I diagnose where there is max memory consumption?

推荐答案

@SamG 是正确的,ColdFusion XML 解析因为 DOM 解析器无法做到,但是 SAX 很痛苦,改用 StAX 解析器,它提供了一个更简单的迭代器接口.查看我提供的另一个问题的答案,了解如何使用 ColdFusion 进行此操作.

@SamG is correct that ColdFusion XML parsing can't do it because of the DOM parser, but SAX is painful, instead use a StAX parser, which provides a much simpler iterator interface. See the answer to another question I provided for an example of how to do this with ColdFusion.

这大致就是您为示例所做的:

This is roughly what you'd do for your example:

<cfset fis = createObject("java", "java.io.FileInputStream").init(
    "#getDirectoryFromPath(getCurrentTemplatePath())#/file.xml"
)>
<cfset bis = createObject("java", "java.io.BufferedInputStream").init(fis)>
<cfset XMLInputFactory = createObject("java", "javax.xml.stream.XMLInputFactory").newInstance()>
<cfset reader = XMLInputFactory.createXMLStreamReader(bis)>

<cfloop condition="#reader.hasNext()#">
    <cfset event = reader.next()>
    <cfif event EQ reader.START_ELEMENT>
        <cfswitch expression="#reader.getLocalName()#">
            <cfcase value="advs">
                <!--- root node, do nothing --->
            </cfcase>
            <cfcase value="advuser">
                <!--- set values used later on for inserts, selects, updates --->
            </cfcase>
            <cfcase value="user">
                <!--- some selects and insert --->
            </cfcase>
            <cfcase value="vehicle">
                <!--- insert or update --->
            </cfcase>
        </cfswitch>
    </cfif>
</cfloop>

<cfset reader.close()>

这篇关于循环一个大的 XML 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆