Python的`urllib2`:当我在维基百科页面上“urlopen”时,为什么会出现错误403? [英] Python's `urllib2`: Why do I get error 403 when I `urlopen` a Wikipedia page?

查看:144
本文介绍了Python的`urllib2`:当我在维基百科页面上“urlopen”时,为什么会出现错误403?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在尝试 urlopen 来自维基百科的某个页面时,我有一个奇怪的错误。这是页面:

I have a strange bug when trying to urlopen a certain page from Wikipedia. This is the page:

http: //en.wikipedia.org/wiki/OpenCola_(drink)

这是shell会话:

>>> f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)')
Traceback (most recent call last):
  File "C:\Program Files\Wing IDE 4.0\src\debug\tserver\_sandbox.py", line 1, in <module>
    # Used internally for debug sandbox under external interpreter
  File "c:\Python26\Lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "c:\Python26\Lib\urllib2.py", line 397, in open
    response = meth(req, response)
  File "c:\Python26\Lib\urllib2.py", line 510, in http_response
    'http', request, response, code, msg, hdrs)
  File "c:\Python26\Lib\urllib2.py", line 435, in error
    return self._call_chain(*args)
  File "c:\Python26\Lib\urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "c:\Python26\Lib\urllib2.py", line 518, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

这发生在我不同大陆的两个不同系统上。有没有人知道为什么会这样?

This happened to me on two different systems in different continents. Does anyone have an idea why this happens?

推荐答案

维基百科的立场是


数据检索:机器人不得使用
检索任何使用的批量内容
与批准的
机器人任务没有直接关系。这包括从另一个网站动态
加载页面,
,这可能导致该网站被列入黑名单
并永久拒绝
访问。如果您想下载
批量内容或镜像项目,
请通过下载或托管
您自己的数据库副本来实现。

Data retrieval: Bots may not be used to retrieve bulk content for any use not directly related to an approved bot task. This includes dynamically loading pages from another website, which may result in the website being blacklisted and permanently denied access. If you would like to download bulk content or mirror a project, please do so by downloading or hosting your own copy of our database.

这就是Python被阻止的原因。您应该下载数据转储

That is why Python is blocked. You're supposed to download data dumps.

无论如何,你可以在Python 2中阅读这样的页面:

Anyways, you can read pages like this in Python 2:

req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib2.urlopen( req )
print con.read()

或者在Python 3中:

Or in Python 3:

import urllib
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib.request.urlopen( req )
print con.read()

这篇关于Python的`urllib2`:当我在维基百科页面上“urlopen”时,为什么会出现错误403?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆