如何使用Mechanize gem获得网站的所有链接? [英] How can I get all links of a website using the Mechanize gem?

查看:87
本文介绍了如何使用Mechanize gem获得网站的所有链接?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用ruby Mechanize gem获得网站的所有链接? 机械化可以像海葵宝石一样做吗?

How can i get all links of a website using ruby Mechanize gem? Does Mechanize can do like Anemone gem:

Anemone.crawl("https://www.google.com.vn/") do |anemone|
  anemone.on_every_page do |page|
    puts page.url
  end
end

我是网络爬虫的新手.预先感谢!

I'm newbie in web crawler. Thanks in advance!

推荐答案

Mechanize非常简单,建议您阅读文档.您可以从 Ruby BastardBook 开始.

It's quite simple with Mechanize, and I suggest you to read the documentation. You can start with Ruby BastardBook.

要使用Mechanize从页面上获取所有链接,请尝试以下操作:

To get all links from a page with Mechanize try this:

require 'mechanize'

agent = Mechanize.new
page = agent.get("http://example.com")
page.links.each {|link| puts "#{link.text} => #{link.href}"}

我认为代码很清楚. page是一个Mechanize :: Page对象,用于存储检索到的页面的全部内容. Mechanize :: Page具有links方法.

The code is clear I think. page is a Mechanize::Page object that stores the whole content of the retrieved page. Mechanize::Page has the links method.

机械化功能非常强大,但是请记住,如果要在不与网站进行任何交互的情况下进行抓取,请使用Nokogiri. Mechanize使用Nokogiri刮擦网络,因此,仅使用Nokogiri刮擦即可.

Mechanize is very powerful, but remember that if you want to do scraping without any interaction with the website use Nokogiri. Mechanize uses Nokogiri to scrap the web, so for scraping only use Nokogiri.

这篇关于如何使用Mechanize gem获得网站的所有链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆