如何使用lxml通过文本查找元素? [英] How to use lxml to find an element by text?
问题描述
假设我们有以下html:
Assume we have the following html:
<html>
<body>
<a href="/1234.html">TEXT A</a>
<a href="/3243.html">TEXT B</a>
<a href="/7445.html">TEXT C</a>
<body>
</html>
如何使它找到包含"TEXT A"的元素"a"?
How do I make it find the element "a", which contains "TEXT A"?
到目前为止,我已经得到:
So far I've got:
root = lxml.hmtl.document_fromstring(the_html_above)
e = root.find('.//a')
我尝试过:
e = root.find('.//a[@text="TEXT A"]')
但这没用,因为"a"标签没有属性文本".
but that didn't work, as the "a" tags have no attribute "text".
有什么办法可以以与我尝试过的方式类似的方式解决此问题?
Is there any way I can solve this in a similar fashion to what I've tried?
推荐答案
您非常亲密.使用text()=
而不是@text
(表示属性).
You are very close. Use text()=
rather than @text
(which indicates an attribute).
e = root.xpath('.//a[text()="TEXT A"]')
或者,如果您仅知道该文本包含"TEXT A",
Or, if you know only that the text contains "TEXT A",
e = root.xpath('.//a[contains(text(),"TEXT A")]')
或者,如果您仅知道文本以"TEXT A"开头,
Or, if you know only that text starts with "TEXT A",
e = root.xpath('.//a[starts-with(text(),"TEXT A")]')
有关可用字符串函数的更多信息,请参见文档.
See the docs for more on the available string functions.
例如,
import lxml.html as LH
text = '''\
<html>
<body>
<a href="/1234.html">TEXT A</a>
<a href="/3243.html">TEXT B</a>
<a href="/7445.html">TEXT C</a>
<body>
</html>'''
root = LH.fromstring(text)
e = root.xpath('.//a[text()="TEXT A"]')
print(e)
收益
[<Element a at 0xb746d2cc>]
这篇关于如何使用lxml通过文本查找元素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!