“ urllib.error.HTTPError:HTTP错误404:找不到”蟒蛇 [英] "urllib.error.HTTPError: HTTP Error 404: Not Found" Python
问题描述
我正在尝试使用urllib.request.open函数打开此网页:
https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//
I'm trying to open this webpage with the urllib.request.open function: "https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//"
我可以使用常规浏览器访问此网页,仍然可以使用urrlib.request.open函数返回HTTP错误404:
I can access this webpage with my regular browser, still with the urrlib.request.open function it returns HTTP error 404:
import urllib.request
page = urllib.request.urlopen("https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//").read()
print(page)
我收到以下错误:
Traceback (most recent call last):
File "/Users/markmouawad/Documents/consu_programa/scrapper.py", line 4, in <module>
page = urllib.request.urlopen("https://prenotaonline.esteri.it/login.aspx?cidsede=100001&returnUrl=//").read()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 472, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 510, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
我正在使用Python 3.5.3
I'm using Python 3.5.3
推荐答案
这是制作蜘蛛人/爬行型机器人时偶然发现的第一件事。
This is a very first thing you stumble upon when making spider/crawling bots.
检测机器人的基本方法是如果请求标头包含 User-Agent
标头。
Basic way of detecting bots is if request headers contain User-Agent
header.
请尝试以下代码段:
import requests
headers = {'USER-AGENT': 'Mozilla/5.0 (iPad; U; CPU OS 3_2_1 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Mobile/7B405'}
r = requests.get(URL, headers=headers)
print r.status_code # should be 200
print r.content # should hold page content
这篇关于“ urllib.error.HTTPError:HTTP错误404:找不到”蟒蛇的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!