如何使用Mechanize gem获得网站的所有链接? [英] How can I get all links of a website using the Mechanize gem?
问题描述
如何使用ruby Mechanize gem获得网站的所有链接? 机械化可以像海葵宝石一样做吗?
How can i get all links of a website using ruby Mechanize gem? Does Mechanize can do like Anemone gem:
Anemone.crawl("https://www.google.com.vn/") do |anemone|
anemone.on_every_page do |page|
puts page.url
end
end
我是网络爬虫的新手.预先感谢!
I'm newbie in web crawler. Thanks in advance!
推荐答案
Mechanize非常简单,建议您阅读文档.您可以从 Ruby BastardBook 开始.
It's quite simple with Mechanize, and I suggest you to read the documentation. You can start with Ruby BastardBook.
要使用Mechanize从页面上获取所有链接,请尝试以下操作:
To get all links from a page with Mechanize try this:
require 'mechanize'
agent = Mechanize.new
page = agent.get("http://example.com")
page.links.each {|link| puts "#{link.text} => #{link.href}"}
我认为代码很清楚. page
是一个Mechanize :: Page对象,用于存储检索到的页面的全部内容. Mechanize :: Page具有links
方法.
The code is clear I think. page
is a Mechanize::Page object that stores the whole content of the retrieved page. Mechanize::Page has the links
method.
机械化功能非常强大,但是请记住,如果要在不与网站进行任何交互的情况下进行抓取,请使用Nokogiri. Mechanize使用Nokogiri刮擦网络,因此,仅使用Nokogiri刮擦即可.
Mechanize is very powerful, but remember that if you want to do scraping without any interaction with the website use Nokogiri. Mechanize uses Nokogiri to scrap the web, so for scraping only use Nokogiri.
这篇关于如何使用Mechanize gem获得网站的所有链接?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!