迭代python中的xml元素时缺少一些文本 [英] missing some text when iterating xml elements in python

查看:72
本文介绍了迭代python中的xml元素时缺少一些文本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在Mac OS X 10.6.8上的Python 2.7.3中运行以下代码.

I am running the following code in Python 2.7.3 on Mac OS X 10.6.8.

import StringIO
from lxml import etree
f = open('./foo', 'r')
doc = ""
while 1:
    line = f.readline()
    doc += line
    if line == "":
        break
tree = etree.parse(StringIO.StringIO(doc), etree.HTMLParser())
r = tree.xpath('//foo')
for i in r:
    for j in i.iter():
        print j.tag, j.text

文件foo包含

<foo> AAA <bar> BBB </bar> XXX </foo>

输出为

foo AAA
bar BBB

为什么我没有收到文本XXX?如何访问?

Why am I not getting the text XXX? How do I access it?

谢谢

推荐答案

尝试一下:

from lxml import etree

tree = etree.fromstring("<foo> AAA <bar> BBB </bar> XXX </foo>")
foos = tree.xpath('//foo')

for foo in foos:
    for j in foo.iter():
        print j.tag, j.text, j.tail

输出:

foo  AAA  None
bar  BBB   XXX 

tail 属性在元素的结束标签.

The tail attribute holds the text after the end tag of the element.

tail是lxml和ElementTree的独特之处.请参见 http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html 了解更多信息.

tail is a peculiarity of lxml and ElementTree compared to other XML models, such as DOM. See http://infohost.nmt.edu/tcc/help/pubs/pylxml/web/etree-view.html for more information.

这篇关于迭代python中的xml元素时缺少一些文本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆