urllib2.urlopen()vs urllib.urlopen()-当urllib工作时,urllib2抛出404!为什么? [英] urllib2.urlopen() vs urllib.urlopen() - urllib2 throws 404 while urllib works! WHY?

查看:135
本文介绍了urllib2.urlopen()vs urllib.urlopen()-当urllib工作时,urllib2抛出404!为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

import urllib

print urllib.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read()

以上脚本可以正常工作并在以下情况下返回预期结果:

The above script works and returns the expected results while:

import urllib2

print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read()

引发以下错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/urllib2.py", line 124, in urlopen
    return _opener.open(url, data)
  File "/usr/lib/python2.5/urllib2.py", line 387, in open
    response = meth(req, response)
  File "/usr/lib/python2.5/urllib2.py", line 498, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.5/urllib2.py", line 425, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.5/urllib2.py", line 360, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.5/urllib2.py", line 506, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

有人知道这是为什么吗?我在没有代理设置的情况下从我的家庭网络中的笔记本电脑运行此程序-只是直接从笔记本电脑到路由器,再到www.

Does anyone know why this is? I'm running this from laptop on my home network with no proxy settings - just straight from my laptop to the router then to the www.

推荐答案

该URL确实生成404,但包含大量HTML内容. urllib2正在(正确)将其作为错误条件处理.您可以像以下方式恢复该站点的404页面的内容:

That URL does indeed result in a 404, but with lots of HTML content. urllib2 is handling it (correctly) as an error condition. You can recover the content of that site's 404 page like so:

import urllib2
try:
    print urllib2.urlopen('http://www.reefgeek.com/equipment/Controllers_&_Monitors/Neptune_Systems_AquaController/Apex_Controller_&_Accessories/').read()
except urllib2.HTTPError, e:
    print e.code
    print e.msg
    print e.headers
    print e.fp.read()

这篇关于urllib2.urlopen()vs urllib.urlopen()-当urllib工作时,urllib2抛出404!为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆