lxml :: etree :: __ ElementStringResult.getparent()工作不正常 [英] lxml::etree::_ElementStringResult.getparent() works incorrectly

查看:112
本文介绍了lxml :: etree :: __ ElementStringResult.getparent()工作不正常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我没有找到任何解释此错误的人...

I did not find anyone explaining this error...

我正在使用lxml 3.1.0.

I'm using lxml 3.1.0.

当有这样的HTML/XML时:

When there is an HTML/XML like that:

<h1 class="fn"><strong class="brand">Lange</strong> XT 100 LV Ski Boots 2014</h1>

运行时将返回字符串"XT 100 LV滑雪靴2014"的_ElementStringResult:

the _ElementStringResult of string " XT 100 LV Ski Boots 2014" will be returned when we run:

>> elemstr = tree.xpath('//body//h1/text()')[0]

但是,当我们按如下方式运行时,我们会得到...

However, when we run as follows, we would get...

>> parent = elemstr.getparent()
>> tree.getpath(parent)
/html/body/therestofthepath/h1/strong

有人有这样的问题吗?还有其他方法可以手动检查文本是否相同,然后与父级文本子级进行检查吗?

Did anyone have a problem like that? Is there any other way that manual check if the text is the same, and otherwise checking with the text child of the parent?

推荐答案

我认为这是元素树(ET)的正确行为.原因源于ET表示文本节点的方式:属性 text 仅表示元素子元素的第一个的文本节点.

I think this is the correct behaviour for element-tree (ET). The reason stems from the way ET represents text nodes: Only a text-node which is the first of the children of an element is represented by the attribute text.

其他混合的文本节点是其前一个兄弟节点的 tail ,在这种情况下为强元素.

Other intermingled text-nodes are the tail of their preceding sibling, in this case the strong-element.

import lxml.etree

xml = """<h1 class="fn"><strong class="brand">Lange</strong> XT 100 LV Ski Boots 2014</h1>"""

tree = lxml.etree.fromstring(xml)
elemstr = tree.xpath('//h1/text()')[0]
print elemstr.getparent().tail

这篇关于lxml :: etree :: __ ElementStringResult.getparent()工作不正常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆