如何查找文本的父节点? [英] How to find text's Parent Node?

查看:100
本文介绍了如何查找文本的父节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我使用:

import requests
from lxml import html

response = request.get(url='someurl')
tree = html.document_fromstring(response.text)


all_text = tree.xpath('//text()')     # which give all text from page

在此all_text列表中,我们具有页面中的所有文本.现在我想知道是否:

Inside this all_text list we have all the text from page. Now I want to know if:

text_searched = all_text[all_text.index('any string which is in all_text list')]

是否可以访问已搜索文本的网络元素?

Is it possible to get to the web element of the text been searched?

推荐答案

您可以为此目的使用 getparent() 方法,例如:

You can use getparent() method for this purpose, for example :

.....
.....
all_text = tree.xpath('//text()')

first_text = all_text[0]
parent_element = first_text.getparent()

print html.tostring(parent_element)


请注意getparent() 的行为如果当前文本元素位于同一父元素中的元素节点之后.由于lxml实现的树模型,在这种情况下,文本被视为前一个元素的tail而不是包含元素的child,因此getparent()将返回前一个元素.请参阅下面的示例,以清楚了解我一直在谈论的内容:


Note that the behavior of getparent() might not be the one you expected in case current text element located after element node in the same parent element. Due to the tree model implemented by lxml, the text is considered tail of the preceding element instead of child of the containing element in this case, so getparent() will return the preceding element. See example below to get a clear idea of what I've been talking about :

from lxml import html
raw = '''<div>
    <span>foo</span>
    bar
</div>'''
root = html.fromstring(raw)
texts = root.xpath('//text()[normalize-space()]')
print [t for t in texts]
# output : ['foo', '\n\tbar\n']

[html.tostring(e.getparent()) for e in texts]
# output : ['<span>foo</span>\n\tbar\n', '<span>foo</span>\n\tbar\n']
# see that calling getparent() on 'bar' returns the <span> not the <div>

这篇关于如何查找文本的父节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆