在Ruby中抓取/解析Google搜索结果 [英] Scraping/Parsing Google search results in Ruby

查看:104
本文介绍了在Ruby中抓取/解析Google搜索结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我拥有Google搜索结果页面的整个HTML.是否有人知道任何现有代码(Ruby?)来抓取/解析Google搜索结果的首页?理想情况下,它将处理可以在任何地方弹出的购物结果"和视频结果"部分.

Assume I have the entire HTML of a Google search results page. Does anyone know of any existing code (Ruby?) to scrape/parse the first page of Google search results? Ideally it would handle the Shopping Results and Video Results sections that can spring up anywhere.

如果没有,一般来说,最好的基于Ruby的屏幕抓取工具是什么?

If not, what's the best Ruby-based tool for screenscraping in general?

需要澄清的是:我知道很难/不可能以编程方式/以API方式获取Google搜索结果,并且仅对结果页面进行CURLing处理会遇到很多问题.在stackoverflow上,以上两点都具有共识.我的问题不同.

推荐答案

这应该很简单,请看一下" Nokogiri 之类的东西.

This should be very simple thing, have a look at the "Screen Scraping with ScrAPI" screen cast by Ryan Bates. You still can do without scraping libraries, just stick to things like Nokogiri.

从Nokogiri的文档:

From Nokogiri's documentation:

require 'nokogiri'
require 'open-uri'

# Get a Nokogiri::HTML:Document for the page we’re interested in...

doc = Nokogiri::HTML(open('http://www.google.com/search?q=tenderlove'))

# Do funky things with it using Nokogiri::XML::Node methods...

####
# Search for nodes by css
doc.css('h3.r a.l').each do |link|
  puts link.content
end

####
# Search for nodes by xpath
doc.xpath('//h3/a[@class="l"]').each do |link|
  puts link.content
end

####
# Or mix and match.
doc.search('h3.r a.l', '//h3/a[@class="l"]').each do |link|
  puts link.content
end

这篇关于在Ruby中抓取/解析Google搜索结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆