lxml查找< div> id ='post- [0-9] *' [英] lxml find <div> with id='post-[0-9]*'
问题描述
我正在尝试查找所有ID以"post- {这里有很多数字}开头"的div标签 我尝试过这样的事情:
tree.xpath("//div[starts-with(@id,'post-[0-9]')]")
但实际上不起作用.有没有一种方法可以在不导入python中的正则表达式的情况下做到这一点?
XPath 1.0 不支持正则表达式,即函数starts-with
不支持正则表达式.
Lxml不支持XPath 2.0.您有以下三种选择:
-
切换到能够处理XPath 2.0的处理器.然后,您可以使用 fn:matches()函数. p>
-
使用符合XPath 1.0的解决方案.这是很丑陋的,但是它是可行的,并且在某些情况下可能是最简单的解决方案.但是,这不是一般的解决方案!它将用
-
替换@id
中的数字,并与此匹配.因此,如果原始的id
是类似post--
的东西,这也将实现.使用一个您不会在此位置出现的字符.
tree.xpath("//div[starts-with(translate(@id, '0123456789', '----------'), 'post--')]")
- lxml支持 EXSLT名称空间,您可以使用regex函数从那里.我认为这是最好的解决方案.
regexpNS = "http://exslt.org/regular-expressions" r = tree.xpath("//div[re:test(@id, '^post-[0-9]')]", namespaces={'re': regexpNS})
I am trying to find all div tags with id begins with "post-{here a lot of digits}" I tried something like this:
tree.xpath("//div[starts-with(@id,'post-[0-9]')]")
But does not really work. Is there a way to do this without importing regular expressions in python?
XPath 1.0 does not support regular expressions, i.e. the function starts-with
does not support regular expressions.
Lxml does not support XPath 2.0. You have the following three options:
Switch to a processor who is able to handle XPath 2.0. You can then use the fn:matches() function.
Use a XPath 1.0 compliant solution. This is rather ugly, but it works and may in some circumstances be the easiest solution. However, this is not a general solution! It will replace the numbers in
@id
with a-
and match against this. So this would also deliver true if the originalid
was something likepost--
. Use a character which you know will not occur at this position.
tree.xpath("//div[starts-with(translate(@id, '0123456789', '----------'), 'post--')]")
- lxml supports the EXSLT namespaces and you can use the regex functions from there. In my opinion this is the best solution.
regexpNS = "http://exslt.org/regular-expressions" r = tree.xpath("//div[re:test(@id, '^post-[0-9]')]", namespaces={'re': regexpNS})
这篇关于lxml查找< div> id ='post- [0-9] *'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!