XPath在两个HTML注释之间进行选择？ [英] XPath to select between two HTML comments?

查看：140 发布时间：2018/6/19 20:37:55 html ruby xpath nokogiri scraper

本文介绍了XPath在两个HTML注释之间进行选择？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个大的HTML页面。但是我想用Xpath选择某些节点：

 < html> 
 ........ 
 <！ - 开始内容 - > 
< div>一些文字< / div> 
< div>< p>更多元素< / p>< / div> 
<！ - 结束内容 - > 
 ....... 
< / html>

我可以在之后选择HTML<！ - begin content - > 使用：

 // comment（）[。='begin content'] /以下:: *

另外我可以在< ;!之前选择HTML。 - end content - > using：

 // comment（）[。= 'end content'] / preceding :: *

但是，我必须让XPath选择全部这两个注释之间的HTML？

解决方案

我会查找第一个注释前面的元素，然后是第二个注释：

  doc.xpath（// * [preceding :: comment（）[。='begin content']] 
 [following :: comment（）[。='end content']]）
＃=> < div>一些文字< / div> 
＃=> < DIV> 
＃=> < p>更多元素< / p> 
＃=> < / DIV> 
＃=> < p>更多元素< / p>

请注意，上面给出了每个元素之间的元素。这意味着如果你遍历每个返回的节点，你会得到一些重复的嵌套节点 - 例如更多元素。

我想你可能真的想要只需获取其中的顶级节点 - 即评论的兄弟姐妹。这可以使用之前/之后的同级来完成。
doc.xpath（// * [before-sibling :: comment（）[。='begin content']] [following-sibling :: comment（）[。='end content']]）＃=> < div>一些文字< / div> ＃=> < DIV> ＃=> < p>更多元素< / p> ＃=> < / DIV>
更新 - 包含评论

使用 // * 仅返回元素节点，它不包含注释（以及其他一些）。您可以将 * 更改为 node（）以返回所有内容。
puts doc.xpath（// node（）[before-sibling :: comment（）[。='begin content']] [following-sibling：：comment（）[。='end content']]）＃=> ＃=> <！ - keywords1：first_keyword - > ＃=> ＃=> < DIV> HTML< / DIV> ＃=>
如果您只想要元素节点和注释（即不是所有内容），则可以使用<$ c

$ doc $ path $ {$ node $（$ {self：：*或self :: comment（）]
[before-sibling :: comment（）[。='begin content']]
[following-sibling :: comment（）[。='end内容']]）
＃〜＃=> <！ - keywords1：first_keyword - >
＃〜＃=> < DIV> HTML< / DIV>

I have a big HTML page. But I want to select certain nodes using Xpath:
<html> ........  <div>some text</div> <div><p>Some more elements</p></div>  ....... </html>
I can select HTML after the  using:
"//comment()[. = ' begin content ']/following::*"
Also I can select HTML before the  using:
"//comment()[. = ' end content ']/preceding::*"
But do I have to have XPath to select all the HTML between the two comments?
解决方案
I would look for elements that are preceded by the first comment and followed by the second comment:
doc.xpath("//*[preceding::comment()[. = ' begin content ']] [following::comment()[. = ' end content ']]") #=> <div>some text</div> #=> <div> #=> <p>Some more elements</p> #=> </div> #=> <p>Some more elements</p>
Note that the above gives you each element in between. This means that if you iterate through each the returned nodes, you will get some duplicated nested nodes - eg the "Some more elements".

I think you might actually want to just get the top-level nodes in between - ie the siblings of the comments. This can be done using the preceding/following-sibling instead.
doc.xpath("//*[preceding-sibling::comment()[. = ' begin content ']] [following-sibling::comment()[. = ' end content ']]") #=> <div>some text</div> #=> <div> #=> <p>Some more elements</p> #=> </div>
Update - Including comments

Using //* only returns element nodes, which does not include comments (and some others). You could change * to node() to return everything.
puts doc.xpath("//node()[preceding-sibling::comment()[. = 'begin content']] [following-sibling::comment()[. = 'end content']]") #=> #=>  #=> #=> <div>html</div> #=>
If you just want element nodes and comments (ie not everything), you can use the self axis:
doc.xpath("//node()[self::* or self::comment()] [preceding-sibling::comment()[. = 'begin content']] [following-sibling::comment()[. = 'end content']]") #~ #=>  #~ #=> <div>html</div>

这篇关于XPath在两个HTML注释之间进行选择？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

XPath在两个HTML注释之间进行选择？ [英] XPath to select between two HTML comments?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

XPath在两个HTML注释之间进行选择？ [英] XPath to select between two HTML comments?

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭