如何绕过HTTP错误403:使用Python 3禁止urllib.request [英] How to get round the HTTP Error 403: Forbidden with urllib.request using Python 3

查看:476
本文介绍了如何绕过HTTP错误403:使用Python 3禁止urllib.request的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

不是每次都遇到,但有时在尝试访问LSE代码时,我会抛出所有烦人的HTTP错误403:禁止的消息.

Hi not every time but sometimes when trying to gain access to the LSE code I am thrown the every annoying HTTP Error 403: Forbidden message.

任何人都知道我只能使用标准的python模块才能克服这个问题(很遗憾,没有漂亮的汤).

Anyone know how I can overcome this issue only using standard python modules (so sadly no beautiful soup).

import urllib.request

url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"
infile = urllib.request.urlopen(url) # Open the URL
data = infile.read().decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1

print(data) # Print the data to the screen

但是时不时地会显示此错误:

However every now and then this is the error I am shown:

Traceback (most recent call last):
  File "/home/ubuntu/workspace/programming_practice/Assessment/Summative/removingThe403Error.py", line 5, in <module>
    webpage = urlopen(req).read().decode('ISO-8859-1')
  File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.4/urllib/request.py", line 469, in open
    response = meth(req, response)
  File "/usr/lib/python3.4/urllib/request.py", line 579, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python3.4/urllib/request.py", line 507, in error
    return self._call_chain(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 441, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.4/urllib/request.py", line 587, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden


Process exited with code: 1

链接到所有可以使用的模块的列表: https://docs .python.org/3.4/py-modindex.html

Link to a list of all the modules that are okay: https://docs.python.org/3.4/py-modindex.html

非常感谢.

推荐答案

这可能是由于 mod_security 引起的.您需要通过以浏览器而不是 python urllib 的形式打开URL来进行欺骗.

This is probably due to mod_security. You need to spoof by opening the URL as a browser, not as python urllib.

在这里,我更正了您的代码:

Here, I corrected your code:

import urllib.request

url = "http://www.londonstockexchange.com/exchange/prices-and-markets/stocks/indices/ftse-indices.html"

# Open the URL as Browser, not as python urllib
page=urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'}) 
infile=urllib.request.urlopen(page).read()
data = infile.decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1

print(data) # Print the data to the screen

接下来,您可以使用 BeautifulSoup 抓取HTML.

Next, you can use BeautifulSoup to scrape the HTML.

这篇关于如何绕过HTTP错误403:使用Python 3禁止urllib.request的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆