lxml查找< div> id ='post- [0-9] *' [英] lxml find <div> with id='post-[0-9]*'

查看:58
本文介绍了lxml查找< div> id ='post- [0-9] *'的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试查找所有ID以"post- {这里有很多数字}开头"的div标签 我尝试过这样的事情:

tree.xpath("//div[starts-with(@id,'post-[0-9]')]")

但实际上不起作用.有没有一种方法可以在不导入python中的正则表达式的情况下做到这一点?

解决方案

XPath 1.0 不支持正则表达式,即函数starts-with不支持正则表达式.

Lxml不支持XPath 2.0.您有以下三种选择:

  • 切换到能够处理XPath 2.0的处理器.然后,您可以使用 fn:matches()函数. p>

  • 使用符合XPath 1.0的解决方案.这是很丑陋的,但是它是可行的,并且在某些情况下可能是最简单的解决方案.但是,这不是一般的解决方案!它将用-替换@id中的数字,并与此匹配.因此,如果原始的id是类似post--的东西,这也将实现.使用一个您不会在此位置出现的字符.

tree.xpath("//div[starts-with(translate(@id, '0123456789', '----------'), 'post--')]")

  • lxml支持 EXSLT名称空间,您可以使用regex函数从那里.我认为这是最好的解决方案.

regexpNS = "http://exslt.org/regular-expressions"
r = tree.xpath("//div[re:test(@id, '^post-[0-9]')]", namespaces={'re': regexpNS})

I am trying to find all div tags with id begins with "post-{here a lot of digits}" I tried something like this:

tree.xpath("//div[starts-with(@id,'post-[0-9]')]")

But does not really work. Is there a way to do this without importing regular expressions in python?

解决方案

XPath 1.0 does not support regular expressions, i.e. the function starts-with does not support regular expressions.

Lxml does not support XPath 2.0. You have the following three options:

  • Switch to a processor who is able to handle XPath 2.0. You can then use the fn:matches() function.

  • Use a XPath 1.0 compliant solution. This is rather ugly, but it works and may in some circumstances be the easiest solution. However, this is not a general solution! It will replace the numbers in @id with a - and match against this. So this would also deliver true if the original id was something like post--. Use a character which you know will not occur at this position.

tree.xpath("//div[starts-with(translate(@id, '0123456789', '----------'), 'post--')]")

  • lxml supports the EXSLT namespaces and you can use the regex functions from there. In my opinion this is the best solution.

regexpNS = "http://exslt.org/regular-expressions"
r = tree.xpath("//div[re:test(@id, '^post-[0-9]')]", namespaces={'re': regexpNS})

这篇关于lxml查找< div> id ='post- [0-9] *'的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆