Watir 抓取顺序元素:如此简单,但没有 [英] Watir scraping sequential elements : so simple, but no

查看:22
本文介绍了Watir 抓取顺序元素:如此简单,但没有的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这太简单了...我想用 watir(红宝石宝石:)

time1

<div class="Locus">locus1</div><div class="Locus">locus2</div><div class="时间">time2</div><div class="Locus">locus3</div><div class="时间">time3</div><div class="Locus">locus4</div><div class="Locus">locus5</div><div class="Locus">locus6</div><div class="时间">time4</div>等等..

结果应该是这样的数组:

time1 locus1时间 1 轨迹 2时间 2 轨迹 3时间 3 轨迹 4时间 3 轨迹 5时间 3 轨迹 6时间 4 xxx

所有的 div 都在同一级别(不是叠瓦).无法使用 watir 方法找到解决方案...谢谢你的帮助

解决方案

对于每个 Locus 元素,您可以通过 #preceding_sibling 方法检索前面的 Time 元素:

result = browser.divs(class: 'Locus').map do |div|时间 = div.preceding_sibling(class: '时间').text轨迹 = div.text#{time} #{locus}"结尾结果#=>["time1 locus1", "time1 locus2", "time2 locus3", "time3 locus4", "time3 locus5", "time3 locus6"]

请注意,如果列表很长,您可能希望通过 Watir 检索 HTML,然后在 Nokogiri 中进行解析.这将节省大量的执行时间,但以可读性为代价.

doc = Nokogiri::HTML.parse(browser.html) # 其中 `browser` 是通常的 Watir::Browserresult = doc.css('.Locus').map do |div|time = div.at('./preceding-sibling::div[@class="Time"]').text轨迹 = div.text#{time} #{locus}"结尾结果#=>["time1 locus1", "time1 locus2", "time1 locus3", "time1 locus4", "time1 locus5", "time1 locus6"]

This is so simple... I want to scrap some web page like that with watir (gem of ruby:)

<div class="Time">time1</div> 
<div class="Locus">locus1</div>
<div class="Locus">locus2</div>
<div class="Time">time2</div>
<div class="Locus">locus3</div>
<div class="Time">time3</div>
<div class="Locus">locus4</div>
<div class="Locus">locus5</div>
<div class="Locus">locus6</div>
<div class="Time">time4</div>
etc..

The result should be an array like that :

time1 locus1
time1 locus2
time2 locus3
time3 locus4
time3 locus5
time3 locus6
time4 xxx

All the divs are at the same level (not imbricated). No way to find the solution using the watir methods... Thx for your help

解决方案

For each Locus element, you can retrieve the preceding Time element via the #preceding_sibling method:

result = browser.divs(class: 'Locus').map do |div|
  time = div.preceding_sibling(class: 'Time').text
  locus = div.text
  "#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time2 locus3", "time3 locus4", "time3 locus5", "time3 locus6"]

Note that if the list is long, you may want to retrieve the HTML via Watir but then do the parsing in Nokogiri. This would save a lot of execution time, but at the cost of readability.

doc = Nokogiri::HTML.parse(browser.html) # where `browser` is the usual Watir::Browser
result = doc.css('.Locus').map do |div|
  time = div.at('./preceding-sibling::div[@class="Time"]').text
  locus = div.text
  "#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time1 locus3", "time1 locus4", "time1 locus5", "time1 locus6"]

这篇关于Watir 抓取顺序元素:如此简单,但没有的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆