使机械化通过x数量的链接并获得所有标题吗? [英] Get mechanize to go through x amounts of links and get all the titles?

查看:62
本文介绍了使机械化通过x数量的链接并获得所有标题吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基本上,我想使用机械化浏览该站点上a-z中的所有页面 http://www.tv.com/shows/sort/a_z/

然后,对于每个字母,获取所有页面上字母"a"的每个节目的标题.目前,我只是想使其与字母"a"一起使用.这是我到目前为止所拥有的,但是不知道从这里去哪里?

require 'mechanize'

agent=Mechanize.new
goog = agent.get "http://www.tv.com/shows/sort/a_z/"
search = goog.link_with(:href => "/shows/sort/a/").click

解决方案

您只需要使用

Basically I want to use mechanize to go through all the pages from a-z on this site http://www.tv.com/shows/sort/a_z/

then, for each letter get the title of every show on all the pages for the letter "a". At the moment I am just trying to get it to work with the letter "a". This is what I have so far but don't know where to go from here?

require 'mechanize'

agent=Mechanize.new
goog = agent.get "http://www.tv.com/shows/sort/a_z/"
search = goog.link_with(:href => "/shows/sort/a/").click

解决方案

You just need to use some XPath to find content you need and navigate.

require 'mechanize'
shows = Array.new
agent = Mechanize.new
agent.get 'http://www.tv.com/shows/sort/a_z/'
agent.page.search('//div[@class="alphabet"]//li[not(contains(@class, "selected"))]/a').each do |letter_link|
  agent.get letter_link[:href]
  agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }

  while next_page_link = agent.page.at('//div[@class="_pagination"]//a[@class="next"]') do
    agent.get next_page_link[:href]
    agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }
  end
end

require 'pp'
pp shows

这篇关于使机械化通过x数量的链接并获得所有标题吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆