lxml.etree iterparse()和完全解析元素 [英] lxml.etree iterparse() and parsing element completely

查看:616
本文介绍了lxml.etree iterparse()和完全解析元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个XML文件,其节点如下所示:

I have an XML file with nodes that looks like this:

<trkpt lat="-37.7944415" lon="144.9616159">
  <ele>41.3681107</ele>
  <time>2015-04-11T03:52:33.000Z</time>
  <speed>3.9598</speed>
</trkpt>

我正在使用lxml.etree.iterparse()迭代解析树.我遍历每个trkpt元素的子节点,并希望打印子节点的文本值.例如

I am using lxml.etree.iterparse() to iteratively parse the tree. I loop over each trkpt element's children and want to print the text value of the children nodes. E.g.

for event, element in etree.iterparse(infile, events=("start", "end")):
    if element.tag == NAMESPACE + 'trkpt':
        for child in list(element):
            print child.text

问题在于,在此阶段节点没有文本,因此打印输出为无".

The problem is that at this stage the node has no text, so the output of the print is 'None'.

我已经通过将'print child.text'语句替换为'print etree.tostring(child)'来验证了这一点,并且输出看起来像这样

I have validated this by replacing the 'print child.text' statement with 'print etree.tostring(child)' and the output looks like this

<ele/>
<time/>
<speed/>    

根据文档,请注意,在接收到开始事件时,元素的文本,尾部和子元素还不一定存在.只有结束事件才能保证元素已被完全解析."

According to the documentation, "Note that the text, tail, and children of an Element are not necessarily present yet when receiving the start event. Only the end event guarantees that the Element has been parsed completely."

所以我将for循环更改为此,请注意'if event =="end":'语句

So I changed my for loop to this, note the 'if event == "end":' statement

for event, element in etree.iterparse(infile, events=("start", "end")):
    if element.tag == NAMESPACE + 'trkpt':
        if event == "end":
            for child in list(element):
                print child.text

但是我仍然得到相同的结果.任何帮助将不胜感激.

But I am still getting the same results. Any help would be greatly appreciated.

推荐答案

您确定不拨打电话吗? element.clear()在您的条件语句之后,像这样吗?

Are you sure that you don't call e.g. element.clear() after your conditional statement, like this?

for event, element in etree.iterparse(infile, events=("start", "end")):
  if element.tag == NAMESPACE + 'trkpt' and event == 'end':
    for child in list(element):
        print child.text
  element.clear()

问题是解析器在发送trkptend事件之前为其子元素发布事件(因为它首先遇到嵌套元素的结束标签).如果在对外部元素调用end事件之前对已解析的元素进行了任何修改,则可能会发生您描述的行为.

The problem is that the parser issues the events for the child elements before it sends the end event for trkpt (because it encounters the end tags of the nested elements first). If you do any modifications to the parsed elements before the end event is called for the outer element, the behaviour you describe may occur.

考虑以下替代方法:

for event, element in etree.iterparse(infile, events=('end',),
    tag=NAMESPACE + 'trkpt'):
  for child in element:
     print child.text
  element.clear()

这篇关于lxml.etree iterparse()和完全解析元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆