Nokogiri和Xpath：找到两个标签之间的所有文本 [英] Nokogiri and Xpath: find all text between two tags

查看：82 发布时间：2018/6/14 20:17:55 html ruby xpath nokogiri
本文介绍了Nokogiri和Xpath：找到两个标签之间的所有文本的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！
问题描述

我不确定这是语法问题还是版本差异问题，但我似乎无法弄清楚。我想从 h2 标签内的（非关闭） td 中的数据添加到 h3 标记。这是HTML的样子。
 < td valign =topwidth =350> 
< br>< h2> NameIWant< / h2>< br> 
< br>小镇< br> 
 
 PhoneNumber< br> 
< a href =mailto：emailIwant@nowhere.comclass =links> emailIwant@nowhere.com< / a> 
< br> 
< a href =http://websiteIwant.comclass =links> websiteIwant.com< / a> 
< br>< br> 
< br>< img src =images / spacer.gif/>< br> 
 
< h3>< b>我想在此之前停止！< / b>< / h3> 
 Lorem Ipsum Yadda Yadda< br> 
< img src =images / spacer.gifborder =0width =20height =11alt =/>< br> 
< td width =25> 
< img src =images / spacer.gifborder =0width =20height =8alt =/> 
< td valign =topwidth =200>< img src =images / spacer.gif/> 
< br> 
< br> 
 
 < a href =http://dontneedthis.com> 
< / a>< / td>< / tr>< br> 
< table border =0cellpadding = 3cellspacing =0width =200> 
 ... 
 < td valign> 直到页面最底部才会关闭，我认为这可能是我遇到问题的原因。


我的Ruby代码如下所示： 
 
 
  require'open-uri'
 require' nokogiri'
 
 @doc = Nokogiri :: XML（open（http://www.url.com））
 
 content = @ doc.css（'/ / td [valign =top] [width =350]'）
 
 name = content.xpath（'// h2'）。text 
 puts name // Returns NameIwant 
 
 townNumberLinks = content.search（'// following :: h2'）
 puts content //返回< h2> NameIWant< / h2> 
  
据我所知，下列语法应该在当前节点的结束标记之后选择文档中的所有内容。如果我尝试在之前使用，例如： 
 
 
  townNumberLinks = content.search （'//先前:: h3'）
 //我得到：< h3>< b>我想在此之前停止！< / b>< / h3> 
  
希望我明确了我想要做的事情。谢谢！
解决方案
这不是微不足道的。在您选择的节点（ td ）的上下文中，要在 两个元素之间获取所有内容，您需要执行 / em>： 
 
 
 
 设置 A ：之前的所有节点 > 第一  h3 ： // h3 [1] /在前:: node（） 
 
 设置 B ：     ： // h2 [1] / following :: node（） 
 
 
要执行交叉点，您可以使用  Kaysian方法 （迈克尔凯）。基本公式是：
 
 $ $ p $ $ $ $ $ $ $ $ code> 
将其应用于您的集合，如上所述，其中 A  =  // h3 [1] / preceding :: node（）和 B  =  // h2 [1] / following :: node（） code>，我们有：
 
 
  // h3 [1] /在前:: node（）[count（。 | // h2 [1] / following :: node（））= count（// h2 [1] / following :: node（））] 
  
，它会从第一个< br> 之后选择所有元素和文本节点将< / h2> 标记添加到最后一个< br> 之后的空白文本节点， < h3> 标记。

 
 您可以轻松选择 < c> h2 和 h3  替换 node（） $ c> text（）在表达式中。这将返回两个标题之间的所有文本节点（包括空格和换行符）： 
 
 
  // h3 [1] /在前:: text（）[count（。| // h2 [1] / following :: text（））= count（// h2 [1] / following :: text（））] 
  
 
I'm not sure if it's a matter of syntax or differences in versions but I can't seem to figure this out.  I want to take data that is inside a (non-closing) td from the h2 tag to the h3 tag.  Here is what the HTML would look like.
<td valign="top" width="350">
    <br><h2>NameIWant</h2><br>
    <br>Town<br>

    PhoneNumber<br>
    <a href="mailto:emailIwant@nowhere.com" class="links">emailIwant@nowhere.com</a>
    <br>
    <a href="http://websiteIwant.com" class="links">websiteIwant.com</a>
    <br><br>    
    <br><img src="images/spacer.gif"/><br>

    <h3><b>I want to stop before this!</b></h3>
    Lorem Ipsum Yadda Yadda<br>
    <img src="images/spacer.gif" border="0" width="20" height="11" alt=""/><br>
    <td width="25">
        <img src="images/spacer.gif" border="0" width="20" height="8" alt=""/>
        <td valign="top" width="200"><img src="images/spacer.gif"/>
            <br>
            <br>

            <table cellspacing="0" cellpadding="0" border="0"/>205"&gt;<tr><td>
                <a href="http://dontneedthis.com">
                </a></td></tr><br>
            <table border="0" cellpadding="3" cellspacing="0" width="200">
            ...
The <td valign> doesn't close until the very bottom of the page which I think might be why I'm having problems.  


My Ruby code looks like:
require 'open-uri'
require 'nokogiri'

@doc = Nokogiri::XML(open("http://www.url.com"))

content = @doc.css('//td[valign="top"] [width="350"]')

name = content.xpath('//h2').text
puts name // Returns NameIwant

townNumberLinks = content.search('//following::h2')
puts content // Returns <h2> NameIWant </h2>
As I understand it following syntax should "Selects everything in the document after the closing tag of the current node".  If I try to use preceding like:
townNumberLinks = content.search('//preceding::h3')
// I get: <h3><b>I want to stop before this!</b></h3>
Hope I made it clear what I'm trying to do.  Thanks!
 解决方案 
It's not trivial. In the context of the nodes you selected (the td), to get everything between two elements, you need to perform an intersection of these two sets:

Set A: All the nodes preceding the first h3: //h3[1]/preceding::node()
Set B: All the nodes following the first h2: //h2[1]/following::node()
To perform an intersection, you can use the Kaysian method (after Michael Kay, who proposed it). The basic formula is:
A[count(.|B) = count(B)]
Applying it to your sets, as defined above, where A = //h3[1]/preceding::node(), and B = //h2[1]/following::node(), we have:
//h3[1]/preceding::node()[ count( . | //h2[1]/following::node()) = count(//h2[1]/following::node()) ]
which will select all elements and text nodes starting with the first <br> after the </h2> tag, to the whitespace text node after the last <br>, just before the next <h3> tag.

You can easily select just the text nodes between h2 and h3 replacing node() for text() in the expression. This one will return all text nodes (including whitespace and linebreaks) between the two headers:
//h3[1]/preceding::text()[ count( . | //h2[1]/following::text()) = count(//h2[1]/following::text()) ]


                        
这篇关于Nokogiri和Xpath：找到两个标签之间的所有文本的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！
                        
                    
                    
                        查看全文
                    
                
				
                            
                        
                
            
        
        
            



        
        
            相关文章
            
                    
                        
                            如何获取两个 HTML 标签之间的所有内容?(使用 XPath?);
                        
                    
                    
                        
                            在 nokogiri 中的两个元素之间抓取文本?;
                        
                    
                    
                        
                            路径.如何选择两个标签之间的所有文本?;
                        
                    
                    
                        
                            使用 XPath 提取两个标签之间的内容;
                        
                    
                    
                        
                            XPath - 在两个节点之间提取文本;
                        
                    
                    
                        
                            XPath选择两个标题之间的所有元素？;
                        
                    
                    
                        
                            两个元素之间的 XPath;
                        
                    
                    
                        
                            XPath 选择两个特定元素之间的所有元素;
                        
                    
                    
                        
                            使用XPATH在两个处理指令之间选择所有文本节点;
                        
                    
                    
                        
                            删除两个标签python之间的文本;
                        
                    
                    
                        
                            查找两个标签/节点之间的文本;
                        
                    
                    
                        
                            BeautifulSoup-如何获取两个不同标签之间的所有文本?;
                        
                    
                    
                        
                            Subversion：如何找到两个标签之间的差异？;
                        
                    
                    
                        
                            删除两个括号之间的所有文本;
                        
                    
                    
                        
                            显示两个标签之间的所有变更集;
                        
                    
                    
                        
                            如何使用 BeautifulSoup 获取两个指定标签之间的所有文本?;
                        
                    
                    
                        
                            NSRegularExpression提取两个XML标签之间的文本;
                        
                    
                    
                        
                            如何找到两个图像之间的所有共享区域;
                        
                    
                    
                        
                            找到两个图节点之间的所有路径;
                        
                    
                    
                        
                            如何获得使用BeautifulSoup只是两个指定标签之间的所有文本？;
                        
                    
                    
                        
                            StAX XML两个必需标签之间的所有内容;
                        
                    
                    
                        
                            使用 BeautifulSoup 抓取两个标签之间的所有 HTML;
                        
                    
                    
                        
                            使用BeautifulSoup抓取两个标签之间的所有HTML;
                        
                    
                    
                        
                            StAX XML 两个必需标签之间的所有内容;
                        
                    
                    
                        
                            XPath获取两个标题之间的标记;
                        
                    
            
        
        
            



        
    
    
        
            前端开发最新文章
            
                    
                        
                            为什么Chrome（在Electron内部）突然重定向到chrome-error：// chromewebdata？;
                        
                    
                    
                        
                            错误102（net :: ERR_CONNECTION_REFUSED）：服务器拒绝连接;
                        
                    
                    
                        
                            如何解决'重定向已被CORS策略阻止：没有'Access-Control-Allow-Origin'标题'？;
                        
                    
                    
                        
                            如何处理“Uncaught（in promise）DOMException：play（）失败，因为用户没有首先与文档交互。”在桌面上使用Chrome 66？;
                        
                    
                    
                        
                            警告：添加非被动事件侦听器到滚动阻塞'touchstart'事件;
                        
                    
                    
                        
                            如何在浏览器中播放.TS文件（视频/ MP2T媒体类型）？;
                        
                    
                    
                        
                            此请求已被阻止;内容必须通过HTTPS提供;
                        
                    
                    
                        
                            资源解释为样式表，但转换为MIME类型text / html（似乎与web服务器无关）;
                        
                    
                    
                        
                            通过HTTPS加载页面但请求不安全的XMLHttpRequest端点;
                        
                    
                    
                        
                            拒绝从执行脚本'*'，因为它的MIME类型（“应用/ JSON'）不是可执行文件，并严格MIME类型检查被启用。;
                        
                    
            
        
        
            
                热门教程
            
            
                
                    
                        Java教程
                    
                
                
                    
                        Apache ANT 教程
                    
                
                
                    
                        Kali Linux教程
                    
                
                
                    
                        JavaScript教程
                    
                
                
                    
                        JavaFx教程
                    
                
                
                    
                        MFC 教程
                    
                
                
                    
                        Apache HTTP客户端教程
                    
                
                
                    
                        Microsoft Visio 教程
                    
                
            
        
        
            
                热门工具
            
            
                
                
                    
                        Java 在线工具
                    
                
                
                    
                        C(GCC) 在线工具
                    
                
                
                    
                        PHP 在线工具
                    
                
                
                    
                        C# 在线工具
                    
                
                
                    
                        Python 在线工具
                    
                
                
                    
                        MySQL 在线工具
                    
                
                
                    
                        VB.NET 在线工具
                    
                
                
                    
                        Lua 在线工具
                    
                
                
                    
                        Oracle 在线工具
                    
                
                
                    
                        C++(GCC) 在线工具
                    
                
                
                    
                        Go 在线工具
                    
                
                
                    
                        Fortran 在线工具
                    
                
            
        
        
    


    

    
        
            登录
            关闭
        
        
            
                扫码关注1秒登录
            
            
                
            
            
                
                
            
            
                发送“验证码”获取
                |
                15天全站免登陆
            
            
        
    
    





    
		
			友情链接：
            IT屋
            Chrome插件
            谷歌浏览器插件
        
        
            IT屋
            ©2016-2022 琼ICP备2021000895号-1
            站点地图
            站点标签
            SiteMap
            <免责申明>
            本站内容来源互联网,如果侵犯您的权益请联系我们删除.
Nokogiri和Xpath：找到两个标签之间的所有文本 [英] Nokogiri and Xpath: find all text between two tags

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭