迭代解析大型XML文件，而无需使用DOM方法 [英] Iteratively parse a large XML file without using the DOM approach

查看：87 发布时间：2020/5/4 8:29:48 python xml xml-parsing lxml

本文介绍了迭代解析大型XML文件，而无需使用DOM方法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个xml文件

<temp>
  <email id="1" Body="abc"/>
  <email id="2" Body="fre"/>
  .
  .
  <email id="998349883487454359203" Body="hi"/>
</temp>

我想读取每个电子邮件标签的xml文件.也就是说，一次我想读取电子邮件id = 1 ..从中提取正文，读取电子邮件id = 2 ...并从中提取正文...依此类推

I want to read the xml file for each email tag. That is, at a time I want to read email id=1..extract body from it, the read email id=2...and extract body from it...and so on

我尝试使用DOM模型进行XML解析，因为我的文件大小为100 GB.然后，我尝试使用:

I tried to do this using DOM model for XML parsing, since my file size is 100 GB..the approach does not work. I then tried using:

  from xml.etree import ElementTree as ET
  tree=ET.parse('myfile.xml')
  root=ET.parse('myfile.xml').getroot()
  for i in root.findall('email/'):
              print i.get('Body')

现在，一旦我获得了root权限..我不明白为什么我的代码无法解析.

Now once I get the root..I am not getting why is my code not been able to parse.

使用iterparse的代码抛出以下错误:

The code upon using iterparse is throwing the following error:

 "UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 437: ordinal not in range(128)"

有人可以帮忙吗

推荐答案

iterparse的示例:

An example for iterparse:

import cStringIO
from xml.etree.ElementTree import iterparse

fakefile = cStringIO.StringIO("""<temp>
  <email id="1" Body="abc"/>
  <email id="2" Body="fre"/>
  <email id="998349883487454359203" Body="hi"/>
</temp>
""")
for _, elem in iterparse(fakefile):
    if elem.tag == 'email':
        print elem.attrib['id'], elem.attrib['Body']
    elem.clear()

只需用您的真实文件替换fakefile. 另请阅读此以了解更多详细信息.

Just replace fakefile with your real file. Also read this for further details.

这篇关于迭代解析大型XML文件，而无需使用DOM方法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

迭代解析大型XML文件，而无需使用DOM方法 [英] Iteratively parse a large XML file without using the DOM approach

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

迭代解析大型XML文件，而无需使用DOM方法 [英] Iteratively parse a large XML file without using the DOM approach

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭