Ruby中提供的Web页面抓取gem/工具 [英] Web page scraping gems/tools available in Ruby

查看:103
本文介绍了Ruby中提供的Web页面抓取gem/工具的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用正在处理的Ruby脚本抓取网页.该项目的目的是展示哪种ETF和股票共同基金最符合价值投资理念.

I'm trying to scrape web pages in a Ruby script that I'm working on. The purpose of the project is to show which ETFs and stock mutual funds are most compatible with the value investing philosophy.

一些我想抓取的页面示例是:

Some examples of pages I'd like to scrape are:

http://finance.yahoo.com/q/pr?s=SPY+Profile
http://finance.yahoo.com/q/hl?s=SPY+Holdings
http://www.marketwatch.com/tools/mutual-fund/list/V

您推荐使用哪些针对Ruby的网络抓取工具,为什么?请记住,那里有成千上万的股票基金,所以我使用的任何工具都必须相当快.

What web scraping tools do you recommend for Ruby, and why? Keep in mind that there are thousands of stock funds out there, so any tool I use has to be reasonably quick.

我是Ruby的新手,但是我有使用lxml在Python中抓取网页的经验(

I am new to Ruby, but I have experience using lxml to scrape web pages in Python (https://github.com/jhsu802701/dopplervalueinvesting/blob/master/screen.py). Once the pages on 5000+ stocks are downloaded, lxml can scrape them all in just a few minutes. (I remember trying BeautifulSoup but rejecting it because it was too slow.)

推荐答案

Ruby中有许多scraping gems可用,例如 Hpricot Nokogiri 等.我建议Nokogiri抓取static web pages.如果您要抓取dynamic web pages(指的是单击按钮,提交表单等).我建议内部使用Nokogiri机械化.

There are so many scraping gems available in Ruby like Hpricot, Nokogiri and so many. I recommend Nokogiri to scrape static web pages. If you are scraping dynamic web pages (means which involves button click, submit form etc..). I recommend Mechanize which internally uses Nokogiri.

这篇关于Ruby中提供的Web页面抓取gem/工具的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆