太多的连接重置异常错误-在Ruby中机械化 [英] Too many connection resets Exception Error - Mechanize in Ruby

查看:61
本文介绍了太多的连接重置异常错误-在Ruby中机械化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Ruby上使用Mechanize,并不断收到此异常错误

I'm using Mechanize on Ruby and keep getting this exception error

C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:158:in `rescue in rbuf_fill': too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 37920120, last used 1457465950.371121 seconds ago (Net::HTTP::Persistent::Error)
    from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:152:in `rbuf_fill'
    from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:134:in `readuntil'
    from C:/Ruby200/lib/ruby/2.0.0/net/protocol.rb:144:in `readline'
    from C:/Ruby200/lib/ruby/2.0.0/net/http/response.rb:39:in `read_status_line'
    from C:/Ruby200/lib/ruby/2.0.0/net/http/response.rb:28:in `read_new'
    from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1406:in `block in transport_request'
    from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1403:in `catch'
    from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1403:in `transport_request'
    from C:/Ruby200/lib/ruby/2.0.0/net/http.rb:1376:in `request'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/rest-client-1.6.7/lib/restclient/net_http_ext.rb:51:in `request'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/net-http-persistent-2.9/lib/net/http/persistent.rb:986:in `request'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize/http/agent.rb:259:in `fetch'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:1281:in `post_form'
    from C:/Ruby200/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:548:in `submit'
    from C:/Users/Feshaq/workspace/ERISScrap/eca_sample/eca_on_scraper.rb:152:in `<main>'

这是第152行:

#Click the form button
agent.page.forms[0].click_button

或者,我尝试使用给定的代码段,并不断收到异常错误:

Alternatively, I tried the given snippet and keep getting the exception error:

#get the form
form = agent.page.form_with(:name => "AdvancedSearchForm")
# get the button you want from the form
button = form.button_with(:value => "Search")
# submit the form using that button
agent.submit(form, button)

感谢您的帮助

推荐答案

我已经多次遇到此问题.我处理它的方式是将运行scraper的代码块包装在救援子句中,并且在出现错误时,我只是杀死连接并重置代理及其标头.这已经100%地起作用了,并且没有给我带来任何问题.然后,我继续在代码中停下的地方.以下示例是我运行的一个刮板,用于遍历建筑物列表并查找页面等.

I have run into this issue many times. The way I handle it is to wrap the block of code running the scraper in a rescue clause and on error I simply kill the connection and reset the agent and its headers. This has worked 100% of the time and has given me no issues. I then carry on where I left off in the code. The below example is a scraper I run for iterating over a list of buildings and looking up pages etc.:

def begin_scraping_list
    Building.all.each do |building_info|
      begin          
      next if convert_boroughs_for_form(building_info) == :no_good
      fill_in_first_page_address_form_and_submit(building_info)
      get_to_proper_second_page
      go_to_page_we_want_for_scraping
      scrape_the_table(building_info)
      rescue
        puts "error happened"
        @agent.shutdown
        @agent = Mechanize.new { |agent| agent.user_agent_alias = 'Windows Chrome'}
        @agent.request_headers
        sleep(5)
        redo
      end
    end
  end

因此,在您的情况下,您需要包装在救援栏中发布的问题区域

So in your case you would want to wrap the problem area you posted in a rescue block

  begin
    #get the form
    form = agent.page.form_with(:name => "AdvancedSearchForm")
    # get the button you want from the form
    button = form.button_with(:value => "Search")
    # submit the form using that button
    agent.submit(form, button)
    rescue
     agent.shutdown
     agent = Mechanize.new { |agent| agent.user_agent_alias = 'Windows Chrome'}
     agent.request_headers
     sleep(2)
     #get the form
     form = agent.page.form_with(:name => "AdvancedSearchForm")
     # get the button you want from the form
     button = form.button_with(:value => "Search")
     # submit the form using that button
     agent.submit(form, button)
   end

注意 如果机械化导致了问题,而您还没有为此投入大量资金,那么使用水豚和phantomjs将拥有更好的刮擦体验.更成熟更多开发人员.在它后面.

NOTE If mechanize is causing issues and you're not already heavily invested in it, you'll have a far better scraping experience using capybara and phantomjs. More mature & more dev. behind it.

这篇关于太多的连接重置异常错误-在Ruby中机械化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆