加载大文件时显示python的XML解析器的进度 [英] Showing progress of python's XML parser when loading a huge file

查看:64
本文介绍了加载大文件时显示python的XML解析器的进度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 Python 的内置 XML 解析器来加载 1.5 gig XML 文件,这需要一整天时间.

Im using Python's built in XML parser to load a 1.5 gig XML file and it takes all day.

from xml.dom import minidom
xmldoc = minidom.parse('events.xml')

我需要知道如何进入其中并测量其进度,以便显示进度条.有什么想法吗?

I need to know how to get inside that and measure its progress so I can show a progress bar. any ideas?

minidom 有另一种称为 parseString() 的方法,它返回一个 DOM 树,假设您传递的字符串是有效的 XML,如果我自己将文件拆分成块并一次一个地将它们传递给 parseString,我可能吗?最后合并所有 DOM 树?

minidom has another method called parseString() that returns a DOM tree assuming the string you pass it is valid XML, If I were to split up the file myself into chunks and pass them to parseString one at a time, could I possibly merge all the DOM trees back together at the end?

推荐答案

您的用例要求您使用 sax 解析器而不是 dom,dom 将所有内容加载到内存中,而 sax 将执行逐行解析,您将事件处理程序编写为你需要所以可能是有效的,你也可以编写进度指示器

you usecase requires that you use sax parser instead of dom, dom loads everything in memory , sax instead will do line by line parsing and you write handlers for events as you need so could be effective and you would be able to write progress indicator also

我还建议您尝试使用 expat 解析器,它非常有用http://docs.python.org/library/pyexpat.html

I also recommend trying expat parser sometime it is very useful http://docs.python.org/library/pyexpat.html

使用 sax 进行进度:

for progress using sax:

当 sax 以增量方式读取文件时,您可以用自己的文件对象包装您传递的文件对象并跟踪已读取的数量.

as sax reads file incrementally you can wrap the file object you pass with your own and keep track how much have been read.

我也不喜欢自己拆分文件并在最后加入 DOM 的想法,这样你最好编写自己的 xml 解析器,我建议改用 sax 解析器我也想知道你在 DOM 树中读取 1.5 gig 文件的目的是什么?看起来萨克斯在这里会更好

edit: I also don't like idea of splitting file yourselves and joining DOM at end, that way you are better writing your own xml parser, i recommend instead using sax parser I also wonder what your purpose of reading 1.5 gig file in DOM tree? look like sax would be better here

这篇关于加载大文件时显示python的XML解析器的进度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆