Python-将超大(6.4GB)的XML文件转换为JSON [英] Python - Convert Very Large (6.4GB) XML files to JSON

查看:266
本文介绍了Python-将超大(6.4GB)的XML文件转换为JSON的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我有一个6.4GB的XML文件,我想将其转换为JSON,然后将其保存到磁盘.我目前正在使用i7 2700k和16GB内存运行OSX 10.8.4,并正在运行Python 64位(再次检查).我收到一个错误消息,我没有足够的内存来分配.我该如何解决这个问题?

Essentially, I have a 6.4GB XML file that I'd like to convert to JSON then save it to disk. I'm currently running OSX 10.8.4 with an i7 2700k and 16GBs of ram, and running Python 64bit (double checked). I'm getting an error that I don't have enough memory to allocate. How do I go about fixing this?

print 'Opening'
f = open('large.xml', 'r')
data = f.read()
f.close()

print 'Converting'
newJSON = xmltodict.parse(data)

print 'Json Dumping'
newJSON = json.dumps(newJSON)

print 'Saving'
f = open('newjson.json', 'w')
f.write(newJSON)
f.close()

错误:

Python(2461) malloc: *** mmap(size=140402048315392) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Traceback (most recent call last):
  File "/Users/user/Git/Resources/largexml2json.py", line 10, in <module>
    data = f.read()
MemoryError

推荐答案

许多Python XML库都支持增量解析XML子元素,例如标准库中的xml.etree.ElementTree.iterparsexml.sax.parse.这些功能通常称为"XML流解析器".

Many Python XML libraries support parsing XML sub elements incrementally, e.g. xml.etree.ElementTree.iterparse and xml.sax.parse in the standard library. These functions are usually called "XML Stream Parser".

您使用的xmltodict库也具有流模式.我认为这可能会解决您的问题

The xmltodict library you used also has a streaming mode. I think it may solve your problem

https://github.com/martinblech/xmltodict#streaming-mode

这篇关于Python-将超大(6.4GB)的XML文件转换为JSON的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆