如何使用Nokogiri浏览DOM [英] How to navigate the DOM using Nokogiri

查看:149
本文介绍了如何使用Nokogiri浏览DOM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在填写变量 parent_element_h1 parent_element_h2 。任何人都可以帮助我使用 Nokogiri 获取我需要的信息到这些变量中?

I'm trying to fill the variables parent_element_h1 and parent_element_h2. Can anyone help me use Nokogiri to get the information I need into those variables?

require 'rubygems'
require 'nokogiri'

value = Nokogiri::HTML.parse(<<-HTML_END)
  "<html>
    <body>
      <p id='para-1'>A</p>
      <div class='block' id='X1'>
        <h1>Foo</h1>
        <p id='para-2'>B</p>
      </div>
      <p id='para-3'>C</p>
      <h2>Bar</h2>
      <p id='para-4'>D</p>
      <p id='para-5'>E</p>
      <div class='block' id='X2'>
        <p id='para-6'>F</p>
      </div>
    </body>
  </html>"
HTML_END

parent = value.css('body').first

# start_here is given: A Nokogiri::XML::Element of the <div> with the id 'X2
start_here = parent.at('div.block#X2')

# this should be a Nokogiri::XML::Element of the nearest, previous h1.
# in this example it's the one with the value 'Foo'
parent_element_h1 = 

# this should be a Nokogiri::XML::Element of the nearest, previous h2. 
# in this example it's the one with the value 'Bar'
parent_element_h2 =



< hr>

请注意: start_here 元素可以是文档中的任何位置。 HTML数据只是一个例子。也就是说,标头< h1> < h2> 可能是兄弟姐妹的$ code > start_here 或兄弟姐妹的小孩 start_here


Please note: The start_here element could be anywhere inside the document. The HTML data is just an example. That said, the headers <h1> and <h2> could be a sibling of start_here or a child of a sibling of start_here.

以下递归方法是一个很好的起点,但它不起作用在< h1> ,因为它是一个兄弟姐妹的孩子 start_here

The following recursive method is a good starting point, but it doesn't work on <h1> because it's a child of a sibling of start_here:

def search_element(_block,_style)
  unless _block.nil?
    if _block.name == _style
      return _block
    else
      search_element(_block.previous,_style)
    end
  else
    return false
  end
end

parent_element_h1 = search_element(start_here,'h1')
parent_element_h2 = search_element(start_here,'h2')






接受答案后,我想出了我自己的解决方案

推荐答案

我想到这几年太晚了,我想,但是感觉不得不张贴,因为所有其他解决方案太复杂了。

I came across this a few years too late I suppose, but felt compelled to post because all the other solutions are way too complicated.

这是XPath的一个语句:

It's a single statement with XPath:

start = doc.at('div.block#X2')

start.at_xpath('(preceding-sibling::h1 | preceding-sibling::*//h1)[last()]')
#=> <h2>Foo</h2>    

start.at_xpath('(preceding-sibling::h2 | preceding-sibling::*//h2)[last()]')
#=> <h2>Bar</h2>

这适用于以前的兄弟姐妹的直接先前兄弟姐妹或子女。无论哪一个匹配, last()谓词确保您获得最接近的匹配。

This accommodates either direct previous siblings or children of previous siblings. Regardless of which one matches, the last() predicate ensures that you get the closest previous match.

这篇关于如何使用Nokogiri浏览DOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆