Watir 抓取顺序元素:如此简单,但没有 [英] Watir scraping sequential elements : so simple, but no
问题描述
这太简单了...我想用 watir(红宝石宝石:)
time1<div class="Locus">locus1</div><div class="Locus">locus2</div><div class="时间">time2</div><div class="Locus">locus3</div><div class="时间">time3</div><div class="Locus">locus4</div><div class="Locus">locus5</div><div class="Locus">locus6</div><div class="时间">time4</div>等等..
结果应该是这样的数组:
time1 locus1时间 1 轨迹 2时间 2 轨迹 3时间 3 轨迹 4时间 3 轨迹 5时间 3 轨迹 6时间 4 xxx
所有的 div 都在同一级别(不是叠瓦).无法使用 watir 方法找到解决方案...谢谢你的帮助
对于每个 Locus 元素,您可以通过 #preceding_sibling
方法检索前面的 Time 元素:
result = browser.divs(class: 'Locus').map do |div|时间 = div.preceding_sibling(class: '时间').text轨迹 = div.text#{time} #{locus}"结尾结果#=>["time1 locus1", "time1 locus2", "time2 locus3", "time3 locus4", "time3 locus5", "time3 locus6"]
请注意,如果列表很长,您可能希望通过 Watir 检索 HTML,然后在 Nokogiri 中进行解析.这将节省大量的执行时间,但以可读性为代价.
doc = Nokogiri::HTML.parse(browser.html) # 其中 `browser` 是通常的 Watir::Browserresult = doc.css('.Locus').map do |div|time = div.at('./preceding-sibling::div[@class="Time"]').text轨迹 = div.text#{time} #{locus}"结尾结果#=>["time1 locus1", "time1 locus2", "time1 locus3", "time1 locus4", "time1 locus5", "time1 locus6"]
This is so simple... I want to scrap some web page like that with watir (gem of ruby:)
<div class="Time">time1</div>
<div class="Locus">locus1</div>
<div class="Locus">locus2</div>
<div class="Time">time2</div>
<div class="Locus">locus3</div>
<div class="Time">time3</div>
<div class="Locus">locus4</div>
<div class="Locus">locus5</div>
<div class="Locus">locus6</div>
<div class="Time">time4</div>
etc..
The result should be an array like that :
time1 locus1
time1 locus2
time2 locus3
time3 locus4
time3 locus5
time3 locus6
time4 xxx
All the divs are at the same level (not imbricated). No way to find the solution using the watir methods... Thx for your help
For each Locus element, you can retrieve the preceding Time element via the #preceding_sibling
method:
result = browser.divs(class: 'Locus').map do |div|
time = div.preceding_sibling(class: 'Time').text
locus = div.text
"#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time2 locus3", "time3 locus4", "time3 locus5", "time3 locus6"]
Note that if the list is long, you may want to retrieve the HTML via Watir but then do the parsing in Nokogiri. This would save a lot of execution time, but at the cost of readability.
doc = Nokogiri::HTML.parse(browser.html) # where `browser` is the usual Watir::Browser
result = doc.css('.Locus').map do |div|
time = div.at('./preceding-sibling::div[@class="Time"]').text
locus = div.text
"#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time1 locus3", "time1 locus4", "time1 locus5", "time1 locus6"]
这篇关于Watir 抓取顺序元素:如此简单,但没有的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!