XPath在两个HTML注释之间进行选择? [英] XPath to select between two HTML comments?

查看:140
本文介绍了XPath在两个HTML注释之间进行选择?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个大的HTML页面。但是我想用Xpath选择某些节点:

 < html> 
........
<! - 开始内容 - >
< div>一些文字< / div>
< div>< p>更多元素< / p>< / div>
<! - 结束内容 - >
.......
< / html>

我可以在之后选择HTML<! - begin content - > 使用:

 // comment()[。='begin content'] /以下:: *

另外我可以在< ;!之前选择HTML。 - end content - > using:

 // comment()[。= 'end content'] / preceding :: *

但是,我必须让XPath选择全部这两个注释之间的HTML?

解决方案

我会查找第一个注释前面的元素,然后是第二个注释:

  doc.xpath(// * [preceding :: comment()[。='begin content']] 
[following :: comment()[。='end content']])
#=> < div>一些文字< / div>
#=> < DIV>
#=> < p>更多元素< / p>
#=> < / DIV>
#=> < p>更多元素< / p>

请注意,上面给出了每个元素之间的元素。这意味着如果你遍历每个返回的节点,你会得到一些重复的嵌套节点 - 例如更多元素。



我想你可能真的想要只需获取其中的顶级节点 - 即评论的兄弟姐妹。这可以使用之前/之后的同级来完成。

  doc.xpath(// * [before-sibling :: comment()[。='begin content']] 
[following-sibling :: comment()[。='end content']])
#=> < div>一些文字< / div>
#=> < DIV>
#=> < p>更多元素< / p>
#=> < / DIV>

更新 - 包含评论

使用 // * 仅返回元素节点,它不包含注释(以及其他一些)。您可以将 * 更改为 node()以返回所有内容。

  puts doc.xpath(// node()[before-sibling :: comment()[。='begin content']] 
[following-sibling: :comment()[。='end content']])
#=>
#=> <! - keywords1:first_keyword - >
#=>
#=> < DIV> HTML< / DIV>
#=>

如果您只想要元素节点和注释(即不是所有内容),则可以使用<$ c

$ doc $ path $ {$ node $($ {self: :*或self :: comment()]
[before-sibling :: comment()[。='begin content']]
[following-sibling :: comment()[。='end内容']])
#〜#=> <! - keywords1:first_keyword - >
#〜#=> < DIV> HTML< / DIV>


I have a big HTML page. But I want to select certain nodes using Xpath:

<html>
 ........
<!-- begin content -->
 <div>some text</div>
 <div><p>Some more elements</p></div>
<!-- end content -->
.......
</html>

I can select HTML after the <!-- begin content --> using:

"//comment()[. = ' begin content ']/following::*" 

Also I can select HTML before the <!-- end content --> using:

"//comment()[. = ' end content ']/preceding::*" 

But do I have to have XPath to select all the HTML between the two comments?

解决方案

I would look for elements that are preceded by the first comment and followed by the second comment:

doc.xpath("//*[preceding::comment()[. = ' begin content ']]
              [following::comment()[. = ' end content ']]")
#=> <div>some text</div>
#=> <div>
#=>   <p>Some more elements</p>
#=> </div>
#=> <p>Some more elements</p>

Note that the above gives you each element in between. This means that if you iterate through each the returned nodes, you will get some duplicated nested nodes - eg the "Some more elements".

I think you might actually want to just get the top-level nodes in between - ie the siblings of the comments. This can be done using the preceding/following-sibling instead.

doc.xpath("//*[preceding-sibling::comment()[. = ' begin content ']]
              [following-sibling::comment()[. = ' end content ']]")
#=> <div>some text</div>
#=> <div>
#=>   <p>Some more elements</p>
#=> </div>

Update - Including comments

Using //* only returns element nodes, which does not include comments (and some others). You could change * to node() to return everything.

puts doc.xpath("//node()[preceding-sibling::comment()[. = 'begin content']]
                        [following-sibling::comment()[. = 'end content']]")
#=> 
#=> <!--keywords1: first_keyword-->
#=> 
#=> <div>html</div>
#=> 

If you just want element nodes and comments (ie not everything), you can use the self axis:

doc.xpath("//node()[self::* or self::comment()]
                   [preceding-sibling::comment()[. = 'begin content']]
                   [following-sibling::comment()[. = 'end content']]")
#~ #=> <!--keywords1: first_keyword-->
#~ #=> <div>html</div>

这篇关于XPath在两个HTML注释之间进行选择?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆