使用python ElementTree用尽内存 [英] Running out of memory using python ElementTree
本文介绍了使用python ElementTree用尽内存的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
这适用于最大大小约为600mb的文件,大于该大小,并且我的内存不足(我有一台16gb的计算机).我该怎么做才能逐个读取文件,或者一次读取一定百分比的xml,或者有较少的内存密集型方法?
This works on files up to about 600mb in size, larger than that and I run out of memory (I have a 16gb machine). What can I do to read in a file in pieces, or read in a certain percentage of the xml at a time or is there a less memory intensive approach?
import csv
import xml.etree.ElementTree as ET
from lxml import etree
import time
import sys
def main(argv):
start_time = time.time()
#file_name = 'sample.xml'
file_name = argv
root = ET.ElementTree(file=file_name).getroot()
csv_file_name = '.'.join(file_name.split('.')[:-1]) + ".txt"
print '\n'
print 'Output file:'
print csv_file_name
with open(csv_file_name, 'w') as file_:
writer = csv.writer(file_, delimiter="\t")
header = [ <the names of the tags here> ]
writer.writerow(header)
tags = [
<bunch of xml tags here>
]
#write the values
# for index in range(8,1000):
for index in range(3,len(root)):
#print index
row=[]
for tagindex,val in enumerate(tags):
searchQuery = "tags"+tags[tagindex]
# print searchQuery
# print root[index]
# print root[index].find(searchQuery).text
if (root[index].find(searchQuery) is None) or (root[index].find(searchQuery).text == None):
row.extend([""])
#print tags[tagindex]+" blank"
else:
row.extend([root[index].find(searchQuery).text])
#print tags[tagindex]+" "+root[index].find(searchQuery).text
writer.writerow(row)
#for i,child in enumerate(root):
#print root[i]
print '\nNumber of elements is: %s' % len(root)
print '\nTotal run time: %s seconds' % (time.time() - start_time)
if __name__ == "__main__":
main(sys.argv[1])
推荐答案
使用 cElementTree 代替 ElementTree .
通过以下方式替换您的ET导入语句:将xml.etree.cElementTree导入为ET
Replace your ET import statement by: import xml.etree.cElementTree as ET
这篇关于使用python ElementTree用尽内存的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文