HTTPError:HTTP错误403:禁止 [英] HTTPError: HTTP Error 403: Forbidden
本文介绍了HTTPError:HTTP错误403:禁止的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我制作了一个供个人使用的python脚本,但不适用于Wikipedia ...
I making a python script for personal use but it's not working for wikipedia...
这项工作:
import urllib2, sys
from bs4 import BeautifulSoup
site = "http://youtube.com"
page = urllib2.urlopen(site)
soup = BeautifulSoup(page)
print soup
这不起作用:
import urllib2, sys
from bs4 import BeautifulSoup
site= "http://en.wikipedia.org/wiki/StackOverflow"
page = urllib2.urlopen(site)
soup = BeautifulSoup(page)
print soup
这是错误:
Traceback (most recent call last):
File "C:\Python27\wiki.py", line 5, in <module>
page = urllib2.urlopen(site)
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 406, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 444, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 378, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 403: Forbidden
推荐答案
在当前代码中:
import urllib2, sys
from BeautifulSoup import BeautifulSoup
site= "http://en.wikipedia.org/wiki/StackOverflow"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = urllib2.Request(site,headers=hdr)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)
print soup
Python 3.X
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
site= "http://en.wikipedia.org/wiki/StackOverflow"
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site,headers=hdr)
page = urlopen(req)
soup = BeautifulSoup(page)
print(soup)
带有Selenium的Python 3.X(执行Javascript函数)
from selenium import webdriver as driver
browser = driver.PhantomJS()
p = browser.get("http://en.wikipedia.org/wiki/StackOverflow")
assert "Stack Overflow - Wikipedia" in browser.title
修改后的版本起作用的原因是因为Wikipedia检查User-Agent是否为流行浏览器"
The reason modified version works is because Wikipedia checks for User-Agent to be of "popular browser"
这篇关于HTTPError:HTTP错误403:禁止的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文