使机械化通过x数量的链接并获得所有标题吗? [英] Get mechanize to go through x amounts of links and get all the titles?
问题描述
基本上,我想使用机械化浏览该站点上a-z中的所有页面 http://www.tv.com/shows/sort/a_z/ >
然后,对于每个字母,获取所有页面上字母"a"的每个节目的标题.目前,我只是想使其与字母"a"一起使用.这是我到目前为止所拥有的,但是不知道从这里去哪里?
require 'mechanize'
agent=Mechanize.new
goog = agent.get "http://www.tv.com/shows/sort/a_z/"
search = goog.link_with(:href => "/shows/sort/a/").click
您只需要使用 Basically I want to use mechanize to go through all the pages from a-z on this site
http://www.tv.com/shows/sort/a_z/ then, for each letter get the title of every show on all the pages for the letter "a". At the moment I am just trying to get it to work with the letter "a". This is what I have so far but don't know where to go from here?
You just need to use some XPath to find content you need and navigate.
这篇关于使机械化通过x数量的链接并获得所有标题吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!require 'mechanize'
agent=Mechanize.new
goog = agent.get "http://www.tv.com/shows/sort/a_z/"
search = goog.link_with(:href => "/shows/sort/a/").click
require 'mechanize'
shows = Array.new
agent = Mechanize.new
agent.get 'http://www.tv.com/shows/sort/a_z/'
agent.page.search('//div[@class="alphabet"]//li[not(contains(@class, "selected"))]/a').each do |letter_link|
agent.get letter_link[:href]
agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }
while next_page_link = agent.page.at('//div[@class="_pagination"]//a[@class="next"]') do
agent.get next_page_link[:href]
agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }
end
end
require 'pp'
pp shows