如何在lxml xpath中使用正则表达式? [英] How to use regular expression in lxml xpath?
问题描述
我正在使用这样的构造:
I'm using construction like this:
doc = parse(url).getroot()
links = doc.xpath("//a[text()='some text']")
但是我需要选择所有带有以某些文本"开头的文本的链接,所以我想知道这里是否可以使用regexp?在lxml文档中找不到任何内容
But I need to select all links which have text beginning with "some text", so I'm wondering is there any way to use regexp here? Didn't find anything in lxml documentation
推荐答案
您可以执行此操作(尽管示例中不需要正则表达式). Lxml支持 EXSLT 扩展函数的正则表达式. (有关 XPath类的信息,请参阅lxml文档,但它也适用于xpath()
方法)
You can do this (although you don't need regular expressions for the example). Lxml supports regular expressions from the EXSLT extension functions. (see the lxml docs for the XPath class, but it also works for the xpath()
method)
doc.xpath("//a[re:match(text(), 'some text')]",
namespaces={"re": "http://exslt.org/regular-expressions"})
请注意,您需要提供名称空间映射,以便它知道xpath表达式中的"re"前缀代表什么.
Note that you need to give the namespace mapping, so that it knows what the "re" prefix in the xpath expression stands for.
这篇关于如何在lxml xpath中使用正则表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!