lxml查找< div> id ='post- [0-9] ' [英] lxml find <div> with id='post-[0-9]'

查看：58 发布时间：2020/5/4 8:35:03 python xpath lxml

本文介绍了lxml查找< div> id ='post- [0-9] *'的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我正在尝试查找所有ID以"post- {这里有很多数字}开头"的div标签我尝试过这样的事情:

tree.xpath("//div[starts-with(@id,'post-[0-9]')]")

但实际上不起作用.有没有一种方法可以在不导入python中的正则表达式的情况下做到这一点?

解决方案

XPath 1.0 不支持正则表达式，即函数starts-with不支持正则表达式.

Lxml不支持XPath 2.0.您有以下三种选择:

切换到能够处理XPath 2.0的处理器.然后，您可以使用 fn:matches()函数. p>
使用符合XPath 1.0的解决方案.这是很丑陋的，但是它是可行的，并且在某些情况下可能是最简单的解决方案.但是，这不是一般的解决方案！它将用-替换@id中的数字，并与此匹配.因此，如果原始的id是类似post--的东西，这也将实现.使用一个您不会在此位置出现的字符.

tree.xpath("//div[starts-with(translate(@id, '0123456789', '----------'), 'post--')]")

regexpNS = "http://exslt.org/regular-expressions"
r = tree.xpath("//div[re:test(@id, '^post-[0-9]')]", namespaces={'re': regexpNS})

I am trying to find all div tags with id begins with "post-{here a lot of digits}" I tried something like this:

tree.xpath("//div[starts-with(@id,'post-[0-9]')]")

But does not really work. Is there a way to do this without importing regular expressions in python?

解决方案

XPath 1.0 does not support regular expressions, i.e. the function starts-with does not support regular expressions.

Lxml does not support XPath 2.0. You have the following three options:

Switch to a processor who is able to handle XPath 2.0. You can then use the fn:matches() function.
Use a XPath 1.0 compliant solution. This is rather ugly, but it works and may in some circumstances be the easiest solution. However, this is not a general solution! It will replace the numbers in @id with a - and match against this. So this would also deliver true if the original id was something like post--. Use a character which you know will not occur at this position.

tree.xpath("//div[starts-with(translate(@id, '0123456789', '----------'), 'post--')]")

lxml supports the EXSLT namespaces and you can use the regex functions from there. In my opinion this is the best solution.

regexpNS = "http://exslt.org/regular-expressions"
r = tree.xpath("//div[re:test(@id, '^post-[0-9]')]", namespaces={'re': regexpNS})

这篇关于lxml查找< div> id ='post- [0-9] *'的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

lxml查找&lt; div&gt; id ='post- [0-9] *' [英] lxml find &lt;div&gt; with id=&#39;post-[0-9]*&#39;