用于正则表达式匹配的 xpath 表达式? [英] xpath expression for regex-like matching?
问题描述
我想在具有特定模式的 html 文档中搜索 div id.我想在正则表达式中匹配这个模式:
I want to search div id in an html doc with certain pattern. I want to match this pattern in regex:
foo_([[:digit:]]{1.8})
使用 xpath.上述模式的 xpath 等价物是什么?
using xpath. What is the xpath equivalent for the above pattern?
我被 //div[@id="foo_
困住了,然后呢?如果有人可以为它继续一个合法的表达.
I'm stuck with //div[@id="foo_
and then what? If someone could continue a legal expression for it.
编辑
抱歉,我想我必须详细说明.其实不是foo_
,而是post_message_
Sorry, I think I have to elaborate more. Actually it's not foo_
, it's post_message_
顺便说一句,我使用机械化/nokogiri(红宝石)
Btw, I use mechanize/nokogiri ( ruby )
这是片段:
html_doc = Nokogiri::HTML(open(myfile))
message_div = html_doc.xpath('//div[substring(@id,13) = "post_message_" and substring-after(@id, "post_message_") => 0 and substring-after(@id, "post_message_") <= 99999999]')
还是失败了.错误信息:
Still failed. Error message:
无法计算表达式 '//div[substring(@id,13) = "post_message_" and substring-after(@id, "post_message_") =>0 和 substring-after(@id, "post_message_") <= 99999999]
' (Nokogiri::XML::XPath::SyntaxError)
Couldn't evaluate expression '
//div[substring(@id,13) = "post_message_" and substring-after(@id, "post_message_") => 0 and substring-after(@id, "post_message_") <= 99999999]
' (Nokogiri::XML::XPath::SyntaxError)
推荐答案
这个怎么样(更新):
XPath 1.0:
"//div[substring-before(@id, '_') = 'foo'
and substring-after(@id, '_') >= 0
and substring-after(@id, '_') <= 99999999]"
编辑 #2:OP 对问题进行了更改.以下更精简的 XPath 1.0 表达式对我有用:
Edit #2: The OP made a change to the question. The following, even more reduced XPath 1.0 expression works for me:
"//div[substring(@id, 1, 13) = 'post_message_'
and substring(@id, 14) >= 0
and substring(@id, 14) <= 99999999]"
XPath 2.0 有一个方便的matches()
功能:
XPath 2.0 has a convenient matches()
function:
"//div[matches(@id, '^foo_\d{1,8}$')]"
除了更好的可移植性之外,我希望数值表达式(XPath 1.0 样式)的性能比正则表达式测试更好,尽管这只会在处理大型数据集时变得明显.
Apart from the better portability, I would expect the numerical expression (XPath 1.0 style) to perform better than the regex test, though this would only become noticeable when processing large data sets.
答案的原始版本:
"//div[substring-before(@id, '_') = 'foo'
and number(substring-after(@id, '_')) = substring-after(@id, '_')
and number(substring-after(@id, '_')) >= 0
and number(substring-after(@id, '_')) <= 99999999]"
number()
函数的使用是不必要的,因为数学比较运算符隐式地将它们的参数强制转换为数字,任何非数字都将成为 NaN
和更大的小于/小于测试将失败.
The use of the number()
function is unnecessary, because the mathematical comparison operators coerce their arguments to numbers implicitly, any non-numbers will become NaN
and the greater than/less than tests will fail.
我还删除了尖括号的编码,因为这是 XML 要求,而不是 XPath 要求.
I also removed the encoding of the angle brackets, since this is an XML requirement, not an XPath requirement.
这篇关于用于正则表达式匹配的 xpath 表达式?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!