使用机械化从HTML表中提取数据 [英] Extract data from HTML Table with mechanize
问题描述
首先,这是示例html表:
First of all, here is the sample html table :
<tr>
<td><strong>Kangchenjunga </strong></td>
<td>8,586m<br /></td>
<td>28,169ft</td>
<td><div align="center">Nepal/India </div></td>
<td>1955; G. Band, J. Brown </td>
</tr>
ARGV [0]将具有一座山的名称(第一个列),返回值应为最后一列,即首次爬山的人.
The ARGV[0] will have the name of a mountain ( the first colomn) and the return value should be the last column, the people who climbed the mountain for the first time.
因此,我需要检查整行的第一列是否为ARGV [0],如果是,那么我应该返回没有日期的最后一列.
So I need to check if the whole rows first column is the ARGV[0], and if it is, then I should return the last column without the date.
require 'mechanize'
p=Mechanize.new.get('www.alpineascents.com/8000m-peaks.asp').body
if p.include?('<strong>'+ARGV[0])
puts 'ok'
end
我有以下内容,如果我在html文档的正文中有ARGV [0],则会显示"ok". 如何搜索同一行的最后一列,其中找到了ARGV [0]?
I've got the following, which prints "ok" if I have the ARGV[0] in the body of the html document. How can I search for the last column of the same row, where the ARGV[0] is found?
示例:
<tr>
<td><strong>GIVE THIS AS A PARAMETER </strong></td>
<td>SKIP THIS<br /></td>
<td>SKIP THIS</td>
<td><div align="center">SKIP THIS</div></td>
<td>I WANT IT TO RETURN THIS</td>
</tr>
我真的是Ruby新手
推荐答案
更简洁的版本更加依赖XPath的黑魔法:)
More succint version relying more on the black magic of XPath :)
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open('http://www.alpineascents.com/8000m-peaks.asp'))
last_td = doc./("//tr[td[strong[text()='#{ARGV[0]}']]]/td[5]")
puts last_td.text.gsub(/.*?;/, '').strip
这篇关于使用机械化从HTML表中提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!