如何使用Nokogiri浏览DOM [英] How to navigate the DOM using Nokogiri
问题描述
我正在填写变量 parent_element_h1
和 parent_element_h2
。任何人都可以帮助我使用 Nokogiri 获取我需要的信息到这些变量中?
I'm trying to fill the variables parent_element_h1
and parent_element_h2
. Can anyone help me use Nokogiri to get the information I need into those variables?
require 'rubygems'
require 'nokogiri'
value = Nokogiri::HTML.parse(<<-HTML_END)
"<html>
<body>
<p id='para-1'>A</p>
<div class='block' id='X1'>
<h1>Foo</h1>
<p id='para-2'>B</p>
</div>
<p id='para-3'>C</p>
<h2>Bar</h2>
<p id='para-4'>D</p>
<p id='para-5'>E</p>
<div class='block' id='X2'>
<p id='para-6'>F</p>
</div>
</body>
</html>"
HTML_END
parent = value.css('body').first
# start_here is given: A Nokogiri::XML::Element of the <div> with the id 'X2
start_here = parent.at('div.block#X2')
# this should be a Nokogiri::XML::Element of the nearest, previous h1.
# in this example it's the one with the value 'Foo'
parent_element_h1 =
# this should be a Nokogiri::XML::Element of the nearest, previous h2.
# in this example it's the one with the value 'Bar'
parent_element_h2 =
< hr>
请注意: start_here
元素可以是文档中的任何位置。 HTML数据只是一个例子。也就是说,标头< h1>
和< h2>
可能是兄弟姐妹的$ code > start_here 或兄弟姐妹的小孩 start_here
。
Please note: The start_here
element could be anywhere inside the document. The HTML data is just an example. That said, the headers <h1>
and <h2>
could be a sibling of start_here
or a child of a sibling of start_here
.
以下递归方法是一个很好的起点,但它不起作用在< h1>
,因为它是一个兄弟姐妹的孩子 start_here
:
The following recursive method is a good starting point, but it doesn't work on <h1>
because it's a child of a sibling of start_here
:
def search_element(_block,_style)
unless _block.nil?
if _block.name == _style
return _block
else
search_element(_block.previous,_style)
end
else
return false
end
end
parent_element_h1 = search_element(start_here,'h1')
parent_element_h2 = search_element(start_here,'h2')
接受答案后,我想出了我自己的解决方案。
推荐答案
我想到这几年太晚了,我想,但是感觉不得不张贴,因为所有其他解决方案太复杂了。
I came across this a few years too late I suppose, but felt compelled to post because all the other solutions are way too complicated.
这是XPath的一个语句:
It's a single statement with XPath:
start = doc.at('div.block#X2')
start.at_xpath('(preceding-sibling::h1 | preceding-sibling::*//h1)[last()]')
#=> <h2>Foo</h2>
start.at_xpath('(preceding-sibling::h2 | preceding-sibling::*//h2)[last()]')
#=> <h2>Bar</h2>
这适用于以前的兄弟姐妹的直接先前兄弟姐妹或子女。无论哪一个匹配, last()
谓词确保您获得最接近的匹配。
This accommodates either direct previous siblings or children of previous siblings. Regardless of which one matches, the last()
predicate ensures that you get the closest previous match.
这篇关于如何使用Nokogiri浏览DOM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!