XPath在两个HTML注释之间进行选择? [英] XPath to select between two HTML comments?
问题描述
我有一个大的HTML页面。但是我想用Xpath选择某些节点:
< html>
........
<! - 开始内容 - >
< div>一些文字< / div>
< div>< p>更多元素< / p>< / div>
<! - 结束内容 - >
.......
< / html>
我可以在之后选择HTML<! - begin content - >
使用:
// comment()[。='begin content'] /以下:: *
另外我可以在< ;!之前选择HTML。 - end content - >
using:
// comment()[。= 'end content'] / preceding :: *
但是,我必须让XPath选择全部这两个注释之间的HTML?
我会查找第一个注释前面的元素,然后是第二个注释:
doc.xpath(// * [preceding :: comment()[。='begin content']]
[following :: comment()[。='end content']])
#=> < div>一些文字< / div>
#=> < DIV>
#=> < p>更多元素< / p>
#=> < / DIV>
#=> < p>更多元素< / p>
请注意,上面给出了每个元素之间的元素。这意味着如果你遍历每个返回的节点,你会得到一些重复的嵌套节点 - 例如更多元素。
我想你可能真的想要只需获取其中的顶级节点 - 即评论的兄弟姐妹。这可以使用之前/之后的同级
来完成。
doc.xpath(// * [before-sibling :: comment()[。='begin content']]
[following-sibling :: comment()[。='end content']])
#=> < div>一些文字< / div>
#=> < DIV>
#=> < p>更多元素< / p>
#=> < / DIV>
更新 - 包含评论
使用 // *
仅返回元素节点,它不包含注释(以及其他一些)。您可以将 *
更改为 node()
以返回所有内容。
puts doc.xpath(// node()[before-sibling :: comment()[。='begin content']]
[following-sibling: :comment()[。='end content']])
#=>
#=> <! - keywords1:first_keyword - >
#=>
#=> < DIV> HTML< / DIV>
#=>
如果您只想要元素节点和注释(即不是所有内容),则可以使用<$ c
$ doc $ path $ {$ node $($ {self: :*或self :: comment()]
[before-sibling :: comment()[。='begin content']]
[following-sibling :: comment()[。='end内容']])
#〜#=> <! - keywords1:first_keyword - >
#〜#=> < DIV> HTML< / DIV>
I have a big HTML page. But I want to select certain nodes using Xpath:
<html>
........
<!-- begin content -->
<div>some text</div>
<div><p>Some more elements</p></div>
<!-- end content -->
.......
</html>
I can select HTML after the <!-- begin content -->
using:
"//comment()[. = ' begin content ']/following::*"
Also I can select HTML before the <!-- end content -->
using:
"//comment()[. = ' end content ']/preceding::*"
But do I have to have XPath to select all the HTML between the two comments?
I would look for elements that are preceded by the first comment and followed by the second comment:
doc.xpath("//*[preceding::comment()[. = ' begin content ']]
[following::comment()[. = ' end content ']]")
#=> <div>some text</div>
#=> <div>
#=> <p>Some more elements</p>
#=> </div>
#=> <p>Some more elements</p>
Note that the above gives you each element in between. This means that if you iterate through each the returned nodes, you will get some duplicated nested nodes - eg the "Some more elements".
I think you might actually want to just get the top-level nodes in between - ie the siblings of the comments. This can be done using the preceding/following-sibling
instead.
doc.xpath("//*[preceding-sibling::comment()[. = ' begin content ']]
[following-sibling::comment()[. = ' end content ']]")
#=> <div>some text</div>
#=> <div>
#=> <p>Some more elements</p>
#=> </div>
Update - Including comments
Using //*
only returns element nodes, which does not include comments (and some others). You could change *
to node()
to return everything.
puts doc.xpath("//node()[preceding-sibling::comment()[. = 'begin content']]
[following-sibling::comment()[. = 'end content']]")
#=>
#=> <!--keywords1: first_keyword-->
#=>
#=> <div>html</div>
#=>
If you just want element nodes and comments (ie not everything), you can use the self
axis:
doc.xpath("//node()[self::* or self::comment()]
[preceding-sibling::comment()[. = 'begin content']]
[following-sibling::comment()[. = 'end content']]")
#~ #=> <!--keywords1: first_keyword-->
#~ #=> <div>html</div>
这篇关于XPath在两个HTML注释之间进行选择?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!