调整Nokogiri连接的超时 [英] Adjusting timeouts for Nokogiri connections

查看:128
本文介绍了调整Nokogiri连接的超时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当服务器繁忙且我正在一页一页地请求页面时,为什么nokogiri等待几秒钟(3-5),但是当这些请求处于循环中时,nokogiri不会等待并抛出超时消息. 我正在使用超时块包装请求,但nokogiri根本不等待该时间. 有建议的程序吗?

Why nokogiri waits for couple of secongs (3-5) when the server is busy and I'm requesting pages one by one, but when these request are in a loop, nokogiri does not wait and throws the timeout message. I'm using timeout block wrapping the request, but nokogiri does not wait for that time at all. Any suggested procedure on this?

# this is a method from the eng class
def get_page(url,page_type)
 begin
  timeout(10) do
    # Get a Nokogiri::HTML::Document for the page we’re interested in...
    @@doc = Nokogiri::HTML(open(url))
  end
 rescue Timeout::Error
  puts "Time out connection request"
  raise
  end
end

 # this is a snippet from the main app calling eng class
 # receives a hash with urls and goes throgh asking one by one
 def retrieve_in_loop(links)
  (0..links.length).each do |idx|
    url = links[idx]
    puts "Visiting link #{idx} of #{links.length}"
    puts "link: #{url}"
    begin
        @@eng.get_page(url, product)
    rescue Exception => e
        puts "Error getting url: #{idx} #{url}"
        puts "This link will be skeeped. Continuing with next one"
    end
  end
end

推荐答案

timeout块只是该代码必须在块内执行而不会触发异常的最长时间.它不会影响Nokogiri或OpenURI内部的任何内容.

The timeout block is simply the max time that that code has to execute inside the block without triggering an exception. It does not affect anything inside Nokogiri or OpenURI.

您可以将超时设置为一年,但是OpenURI仍然可以随时超时.

You can set the timeout to a year, but OpenURI can still time out whenever it likes.

因此,您的问题很可能是OpenURI超时了连接尝试本身. Nokogiri没有超时;它只是一个解析器.

So your problem is most likely that OpenURI is timing out on the connection attempt itself. Nokogiri has no timeouts; it's just a parser.

调整读取超时

您可以在OpenURI上调整的唯一超时是读取超时.看来您无法通过此方法更改连接超时:

The only timeout you can adjust on OpenURI is the read timeout. It seems you cannot change the connection timeout through this method:

open(url, :read_timeout => 10)

调整连接超时

要调整连接超时,您必须直接使用Net::HTTP:

To adjust the connection timeout you would have to go with Net::HTTP directly instead:

uri = URI.parse(url)

http = Net::HTTP.new(uri.host, uri.port)
http.open_timeout = 10
http.read_timeout = 10

response = http.get(uri.path)

Nokogiri.parse(response.body)

您还可以在此处查看一些其他讨论:

You can also take a look at some additional discussion here:

Ruby Net :: HTTP超时
增加Net :: HTTP的超时时间

这篇关于调整Nokogiri连接的超时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆