在大型XML文件上循环 [英] Looping over a large XML file
问题描述
我遇到了关于大约20-30 MB(650000行)的XML文件的问题。
I'm having problems looping over an XML file about 20-30 MB (650000 rows).
这是我的元代码:
<cffile action="READ" ile="file.xml" variable="usersRaw">
<cfset usersXML = XmlParse(usersRaw)>
<cfset advsXML = XmlSearch(usersXML, "/advs/advuser")>
<cfset users = XmlSearch(usersXML, "/advs/advuser/user")>
<cfset numUsers = ArrayLen(users)>
<cfloop index="i" from="1" to="#numUsers#">
... some selects...
... insert...
<cfset advs = annunciXml[i]["vehicle"]>
<cfset numAdvs = ArrayLen(advs)>
<cfloop index="k" from="1" to="#numAdvs#">
... insert... or ... update...
</cfloop>
</cfloop>
xml文件的结构是(是,不是很好: - )
struct of xml file is (yes, is not very good :-)
<advs>
<advuser>
<user>
</user>
<vehicle>
<vehicle>
</advuser>
</advs>
在〜120,000行后,我收到一条错误:内存不足。
After ~120,000 rows I get an error: "Out of memory".
如何提高我的脚本的性能?
How can I improve performance of my script?
如何诊断最大内存消耗?
How can I diagnose where there is max memory consumption?
推荐答案
@SamG是正确的ColdFusion XML解析不能做,因为DOM解析器,但SAX是痛苦的,而是使用一个StAX解析器,提供了一个更简单的迭代器接口。 请参阅我提供的另一个问题的答案,以了解如何使用ColdFusion 这样做。
@SamG is correct that ColdFusion XML parsing can't do it because of the DOM parser, but SAX is painful, instead use a StAX parser, which provides a much simpler iterator interface. See the answer to another question I provided for an example of how to do this with ColdFusion.
这大致上就是你的例子:
This is roughly what you'd do for your example:
<cfset fis = createObject("java", "java.io.FileInputStream").init(
"#getDirectoryFromPath(getCurrentTemplatePath())#/file.xml"
)>
<cfset bis = createObject("java", "java.io.BufferedInputStream").init(fis)>
<cfset XMLInputFactory = createObject("java", "javax.xml.stream.XMLInputFactory").newInstance()>
<cfset reader = XMLInputFactory.createXMLStreamReader(bis)>
<cfloop condition="#reader.hasNext()#">
<cfset event = reader.next()>
<cfif event EQ reader.START_ELEMENT>
<cfswitch expression="#reader.getLocalName()#">
<cfcase value="advs">
<!--- root node, do nothing --->
</cfcase>
<cfcase value="advuser">
<!--- set values used later on for inserts, selects, updates --->
</cfcase>
<cfcase value="user">
<!--- some selects and insert --->
</cfcase>
<cfcase value="vehicle">
<!--- insert or update --->
</cfcase>
</cfswitch>
</cfif>
</cfloop>
<cfset reader.close()>
这篇关于在大型XML文件上循环的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!