用于正则表达式匹配的 xpath 表达式? [英] xpath expression for regex-like matching?

查看:74
本文介绍了用于正则表达式匹配的 xpath 表达式?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在具有特定模式的 html 文档中搜索 div id.我想在正则表达式中匹配这个模式:

I want to search div id in an html doc with certain pattern. I want to match this pattern in regex:

foo_([[:digit:]]{1.8})

使用 xpath.上述模式的 xpath 等价物是什么?

using xpath. What is the xpath equivalent for the above pattern?

我被 //div[@id="foo_ 困住了,然后呢?如果有人可以为它继续一个合法的表达.

I'm stuck with //div[@id="foo_ and then what? If someone could continue a legal expression for it.

编辑

抱歉,我想我必须详细说明.其实不是foo_,而是post_message_

Sorry, I think I have to elaborate more. Actually it's not foo_, it's post_message_

顺便说一句,我使用机械化/nokogiri(红宝石)

Btw, I use mechanize/nokogiri ( ruby )

这是片段:

html_doc = Nokogiri::HTML(open(myfile))
message_div =  html_doc.xpath('//div[substring(@id,13) = "post_message_" and substring-after(@id, "post_message_") => 0 and substring-after(@id, "post_message_") <= 99999999]') 

还是失败了.错误信息:

Still failed. Error message:

无法计算表达式 '//div[substring(@id,13) = "post_message_" and substring-after(@id, "post_message_") =>0 和 substring-after(@id, "post_message_") <= 99999999]' (Nokogiri::XML::XPath::SyntaxError)

Couldn't evaluate expression '//div[substring(@id,13) = "post_message_" and substring-after(@id, "post_message_") => 0 and substring-after(@id, "post_message_") <= 99999999]' (Nokogiri::XML::XPath::SyntaxError)

推荐答案

这个怎么样(更新):

XPath 1.0:

"//div[substring-before(@id, '_') = 'foo' 
       and substring-after(@id, '_') >= 0 
       and substring-after(@id, '_') <= 99999999]"

编辑 #2:OP 对问题进行了更改.以下更精简的 XPath 1.0 表达式对我有用:

Edit #2: The OP made a change to the question. The following, even more reduced XPath 1.0 expression works for me:

"//div[substring(@id, 1, 13) = 'post_message_' 
       and substring(@id, 14) >= 0 
       and substring(@id, 14) <= 99999999]"

XPath 2.0 有一个方便的matches()功能:

XPath 2.0 has a convenient matches() function:

"//div[matches(@id, '^foo_\d{1,8}$')]"

除了更好的可移植性之外,我希望数值表达式(XPath 1.0 样式)的性能比正则表达式测试更好,尽管这只会在处理大型数据集时变得明显.

Apart from the better portability, I would expect the numerical expression (XPath 1.0 style) to perform better than the regex test, though this would only become noticeable when processing large data sets.

答案的原始版本:

"//div[substring-before(@id, '_') = 'foo' 
       and number(substring-after(@id, '_')) = substring-after(@id, '_') 
       and number(substring-after(@id, '_')) &gt;= 0 
       and number(substring-after(@id, '_')) &lt;= 99999999]"

number() 函数的使用是不必要的,因为数学比较运算符隐式地将它们的参数强制转换为数字,任何非数字都将成为 NaN 和更大的小于/小于测试将失败.

The use of the number() function is unnecessary, because the mathematical comparison operators coerce their arguments to numbers implicitly, any non-numbers will become NaN and the greater than/less than tests will fail.

我还删除了尖括号的编码,因为这是 XML 要求,而不是 XPath 要求.

I also removed the encoding of the angle brackets, since this is an XML requirement, not an XPath requirement.

这篇关于用于正则表达式匹配的 xpath 表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆