使用 Nokogiri 从雅虎财经中获取价值? [英] Using Nokogiri to scrape a value from Yahoo Finance?
问题描述
我写了一个简单的脚本:
I wrote a simple script:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "http://au.finance.yahoo.com/q/bs?s=MYGN"
doc = Nokogiri::HTML(open(url))
name = doc.at_css("#yfi_rt_quote_summary h2").text
market_cap = doc.at_css("#yfs_j10_mygn").text
ebit = doc.at("//*[@id='yfncsumtab']/tbody/tr[2]/td/table[2]/tbody/tr/td/table/tbody/tr[11]/td[2]/strong").text
puts "#{name} - #{market_cap} - #{ebit}"
该脚本从雅虎财经中获取三个值.问题是 ebit
XPath 返回 nil.我获得 XPath 的方式是使用 Chrome 开发人员工具以及复制和粘贴.
The script grabs three values from Yahoo finance. The problem is that the ebit
XPath returns nil. The way I got the XPath was using the Chrome developer tools and copy and pasting.
这是我试图从 http://au.finance.yahoo.com/q/bs?s=MYGN,总流动资产
行的实际值为483,992
.
This is the page I'm trying to get the value from http://au.finance.yahoo.com/q/bs?s=MYGN and the actual value is 483,992
in the total current assets
row.
任何帮助将不胜感激,特别是如果有办法使用 CSS 选择器获取此值.
Any help would be appreciated, especially if there is a way to get this value with CSS selectors.
推荐答案
Nokogiri 支持:
Nokogiri supports:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML(open("http://au.finance.yahoo.com/q/bs?s=MYGN"))
ebit = doc.at('strong:contains("Total Current Assets")').parent.next_sibling.text.gsub(/[^,\d]+/, '')
puts ebit
# >> 483,992
我使用 I'm using the Nokogiri 支持许多 jQuery 的 JavaScript 扩展,这就是 Nokogiri supports a number of jQuery's JavaScript extensions, which is why 这篇关于使用 Nokogiri 从雅虎财经中获取价值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!<strong>
标记作为带有 :contains
伪类的位置标记,然后备份到包含的 <td>
,移动到下一个 并抓取它的文本,然后最后使用 gsub(/[^,\d]+/, '')
删除不是数字或逗号的所有内容.
<strong>
tag as an place-marker with the :contains
pseudo-class, then backing up to the containing <td>
, moving to the next <td>
and grabbing its text, then finally stripping the white-space using gsub(/[^,\d]+/, '')
which removes everything that isn't a number or a comma.:contains
起作用的原因.:contains
works.
登录
关闭