Python3:urllib.error.HTTPError:HTTP 错误 403:禁止 [英] Python3: urllib.error.HTTPError: HTTP Error 403: Forbidden

查看:66
本文介绍了Python3:urllib.error.HTTPError:HTTP 错误 403:禁止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请帮帮我!

我使用的是 Python3.3 和这段代码:

导入 urllib.request导入系统Open_Page = urllib.request.urlopen(http://wowcircle.com").read().decode().encode('utf-8')

我接受这个:

 回溯(最近一次调用最后一次):文件C:\Users\1\Desktop\WCLauncer\reg.py",第 5 行,在 <module> 中.http://forum.wowcircle.com"文件C:\Python33\lib\urllib\request.py",第 156 行,在 urlopen 中返回 opener.open(url, data, timeout)文件C:\Python33\lib\urllib\request.py",第 475 行,打开响应 = 甲基(请求,响应)文件C:\Python33\lib\urllib\request.py",第 587 行,在 http_response'http'、请求、响应、代码、味精、hdrs)文件C:\Python33\lib\urllib\request.py",第 507 行,错误结果 = self._call_chain(*args)_call_chain 中的文件C:\Python33\lib\urllib\request.py",第 447 行结果 = func(*args)文件C:\Python33\lib\urllib\request.py",第 692 行,在 http_error_302返回 self.parent.open(new, timeout=req.timeout)文件C:\Python33\lib\urllib\request.py",第 475 行,打开响应 = 甲基(请求,响应)文件C:\Python33\lib\urllib\request.py",第 587 行,在 http_response'http'、请求、响应、代码、味精、hdrs)文件C:\Python33\lib\urllib\request.py",第 507 行,错误结果 = self._call_chain(*args)_call_chain 中的文件C:\Python33\lib\urllib\request.py",第 447 行结果 = func(*args)文件C:\Python33\lib\urllib\request.py",第 692 行,在 http_error_302返回 self.parent.open(new, timeout=req.timeout)文件C:\Python33\lib\urllib\request.py",第 475 行,打开响应 = 甲基(请求,响应)文件C:\Python33\lib\urllib\request.py",第 587 行,在 http_response'http'、请求、响应、代码、味精、hdrs)文件C:\Python33\lib\urllib\request.py",第 507 行,错误结果 = self._call_chain(*args)_call_chain 中的文件C:\Python33\lib\urllib\request.py",第 447 行结果 = func(*args)文件C:\Python33\lib\urllib\request.py",第 692 行,在 http_error_302返回 self.parent.open(new, timeout=req.timeout)文件C:\Python33\lib\urllib\request.py",第 475 行,打开响应 = 甲基(请求,响应)文件C:\Python33\lib\urllib\request.py",第 587 行,在 http_response'http'、请求、响应、代码、味精、hdrs)文件C:\Python33\lib\urllib\request.py",第 513 行,出错返回 self._call_chain(*args)_call_chain 中的文件C:\Python33\lib\urllib\request.py",第 447 行结果 = func(*args)文件C:\Python33\lib\urllib\request.py",第 595 行,在 http_error_default 中引发 HTTPError(req.full_url, code, msg, hdrs, fp)urllib.error.HTTPError:HTTP 错误 403:禁止

我了解,我无法访问网站 wowcircle.com.但我只想拿源代码!我相信我可以在没有访问权限的情况下做到这一点,但如何做到?

解决方案

我建议您相应地设置标题.查看您的浏览器发送的内容(HTTP 标头插件).

一个函数可能如下所示:

def openAsOpera(url):u = urllib.URLopener() # Python 3: urllib.request.URLOpeneru.addheaders = []u.addheader('User-Agent', 'Opera/9.80 (Windows NT 6.1; WOW64; U; de) Presto/2.10.289 Version/12.01')u.addheader('Accept-Language', 'de-DE,de;q=0.9,en;q=0.8')u.addheader('接受', 'text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1')f = u.open(url)内容 = f.read()f.close()返回内容

这可以让您解决某些网页上的一些错误,这些错误对客户端的期望高于基本版本.

现在我收到此错误:

回溯(最近一次调用最后一次):文件<pyshell#0>",第 1 行,在 <module> 中s = openAsOpera('http://wowcircle.com/')文件C:....pyw",第 522 行,在 openAsOperaf = u.open(url)文件C:\Python27\lib\urllib.py",第 208 行,打开返回 getattr(self, name)(url)文件C:\Python27\lib\urllib.py",第 359 行,在 open_http返回 self.http_error(url, fp, errcode, errmsg, headers)文件C:\Python27\lib\urllib.py",第 376 行,在 http_error 中返回 self.http_error_default(url, fp, errcode, errmsg, headers)文件C:\Python27\lib\urllib.py",第 381 行,在 http_error_default 中引发 IOError, ('http error', errcode, errmsg, headers)IOError: ('http error', 302, 'Moved Temporously', )

这意味着您现在可以访问,因为您伪造了真实浏览器的请求.

<预><代码>>>>尝试:s = openAsOpera('http://wowcircle.com/?pmtry=1')除了:导入系统;ty, err, tb = sys.exc_info()>>>err.args[3].headers['服务器:nginx\r\n'、'日期:2014 年 4 月 5 日星期六 07:42:00 GMT\r\n'、'内容类型:文本/html\r\n'、'内容长度:154\r\n','连接:关闭\r\n','设置-Cookie:PMBC=9979187990a58a5bfdaa6d1380ad6156;path=/\r\n', '位置:http://wowcircle.com/?pmtry=1\r\n']

一个需要注意的地方:重定向到这个位置:http://wowcircle.com/?pmtry=1 然后到 whis:http://wowcircle.com/?pmtry=2.它计数.并且似乎在等待 cookie.

所以我的分析结果是:每次访问网站时不要忘记发送cookie.

Please, Help me!

I am using Python3.3 and this code:

import urllib.request
import sys
Open_Page = urllib.request.urlopen(
        "http://wowcircle.com"
    ).read().decode().encode('utf-8')

And I take this:

    Traceback (most recent call last):
  File "C:\Users\1\Desktop\WCLauncer\reg.py", line 5, in <module>
    "http://forum.wowcircle.com"
  File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\lib\urllib\request.py", line 475, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 587, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 507, in error
    result = self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python33\lib\urllib\request.py", line 475, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 587, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 507, in error
    result = self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python33\lib\urllib\request.py", line 475, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 587, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 507, in error
    result = self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 692, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "C:\Python33\lib\urllib\request.py", line 475, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 587, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 513, in error
    return self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 595, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

I understand, that I have no access to site wowcircle.com. But i want only to take source code! I believe that I can do it, without acess, but how?

解决方案

I advise you to set the headers accordingly. Have a look what your browser sends (HTTP headers plugin).

A function may look like this:

def openAsOpera(url):
    u = urllib.URLopener() # Python 3: urllib.request.URLOpener
    u.addheaders = []
    u.addheader('User-Agent', 'Opera/9.80 (Windows NT 6.1; WOW64; U; de) Presto/2.10.289 Version/12.01')
    u.addheader('Accept-Language', 'de-DE,de;q=0.9,en;q=0.8')
    u.addheader('Accept', 'text/html, application/xml;q=0.9, application/xhtml+xml, image/png, image/webp, image/jpeg, image/gif, image/x-xbitmap, */*;q=0.1')
    f = u.open(url)
    content = f.read()
    f.close()
    return content

This gets you around some errors on some webpages which expect more from a client than the basic version does.

Now I get this error:

Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    s = openAsOpera('http://wowcircle.com/')
  File "C:....pyw", line 522, in openAsOpera
    f = u.open(url)
  File "C:\Python27\lib\urllib.py", line 208, in open
    return getattr(self, name)(url)
  File "C:\Python27\lib\urllib.py", line 359, in open_http
    return self.http_error(url, fp, errcode, errmsg, headers)
  File "C:\Python27\lib\urllib.py", line 376, in http_error
    return self.http_error_default(url, fp, errcode, errmsg, headers)
  File "C:\Python27\lib\urllib.py", line 381, in http_error_default
    raise IOError, ('http error', errcode, errmsg, headers)
IOError: ('http error', 302, 'Moved Temporarily', <httplib.HTTPMessage instance at 0x02C8F1C0>)

Which means that you get access now because you fake the request of a real browser.

>>> try: s = openAsOpera('http://wowcircle.com/?pmtry=1')
except: import sys; ty, err, tb = sys.exc_info()

>>> err.args[3].headers
['Server: nginx\r\n', 'Date: Sat, 05 Apr 2014 07:42:00 GMT\r\n', 'Content-Type: text/html\r\n', 'Content-Length: 154\r\n', 'Connection: close\r\n', 'Set-Cookie: PMBC=9979187990a58a5bfdaa6d1380ad6156; path=/\r\n', 'Location: http://wowcircle.com/?pmtry=1\r\n']

One thinkg to notice there: The redirect goes to this location: http://wowcircle.com/?pmtry=1 and then to whis: http://wowcircle.com/?pmtry=2. It counts up. And seems to wait for the cookie.

SO the result of my analysis is: Do not forget to send the cookie every time you access the site.

这篇关于Python3:urllib.error.HTTPError:HTTP 错误 403:禁止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆