Ruby中提供的Web页面抓取gem/工具 [英] Web page scraping gems/tools available in Ruby
问题描述
我正在尝试使用正在处理的Ruby脚本抓取网页.该项目的目的是展示哪种ETF和股票共同基金最符合价值投资理念.
I'm trying to scrape web pages in a Ruby script that I'm working on. The purpose of the project is to show which ETFs and stock mutual funds are most compatible with the value investing philosophy.
一些我想抓取的页面示例是:
Some examples of pages I'd like to scrape are:
http://finance.yahoo.com/q/pr?s=SPY+Profile
http://finance.yahoo.com/q/hl?s=SPY+Holdings
http://www.marketwatch.com/tools/mutual-fund/list/V
您推荐使用哪些针对Ruby的网络抓取工具,为什么?请记住,那里有成千上万的股票基金,所以我使用的任何工具都必须相当快.
What web scraping tools do you recommend for Ruby, and why? Keep in mind that there are thousands of stock funds out there, so any tool I use has to be reasonably quick.
我是Ruby的新手,但是我有使用lxml在Python中抓取网页的经验(
I am new to Ruby, but I have experience using lxml to scrape web pages in Python (https://github.com/jhsu802701/dopplervalueinvesting/blob/master/screen.py). Once the pages on 5000+ stocks are downloaded, lxml can scrape them all in just a few minutes. (I remember trying BeautifulSoup but rejecting it because it was too slow.)
推荐答案
Ruby
中有许多scraping gems
可用,例如 Hpricot , Nokogiri 等.我建议Nokogiri
抓取static web pages
.如果您要抓取dynamic web pages
(指的是单击按钮,提交表单等).我建议内部使用Nokogiri
的机械化.
There are so many scraping gems
available in Ruby
like Hpricot, Nokogiri and so many. I recommend Nokogiri
to scrape static web pages
. If you are scraping dynamic web pages
(means which involves button click, submit form etc..). I recommend Mechanize which internally uses Nokogiri
.
这篇关于Ruby中提供的Web页面抓取gem/工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!