如何处理Nokogiri中的404 not found错误 [英] How to handle 404 not found errors in Nokogiri

查看:120
本文介绍了如何处理Nokogiri中的404 not found错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Nokogiri抓取网页.很少有网址需要猜测,如果网址不存在,则返回404 not found错误.有没有办法捕获此异常?

I am using Nokogiri to scrape web pages. Few urls need to be guessed and returns 404 not found error when they don't exist. Is there a way to capture this exception?

http://yoursite/page/38475 #=> page number 38475 doesn't exist

我尝试了以下无效的方法.

I tried the following which didn't work.

url = "http://yoursite/page/38475"
doc = Nokogiri::HTML(open(url)) do
  begin
    rescue Exception => e
      puts "Try again later"
  end
end

推荐答案

它不起作用,因为您没有在寻找404状态的情况下抢救会引发错误的代码部分(它是open(url)调用).以下代码应该可以工作:

It doesn't work, because you are not rescuing part of code (it's open(url) call) that raises an error in case of finding 404 status. The following code should work:

url = 'http://yoursite/page/38475'
begin
  file = open(url)
  doc = Nokogiri::HTML(file) do
    # handle doc
  end
rescue OpenURI::HTTPError => e
  if e.message == '404 Not Found'
    # handle 404 error
  else
    raise e
  end
end

顺便说一句,关于营救Exception: 为什么救援异常"是一种不好的样式=> Ruby中的e`?

BTW, about rescuing Exception: Why is it a bad style to `rescue Exception => e` in Ruby?

这篇关于如何处理Nokogiri中的404 not found错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆