urllib2.urlopen 无法获取图片,但浏览器可以 [英] urllib2.urlopen cannot get image, but browser can
问题描述
有一个带有gif图片的链接,但urllib2无法下载.
There is a link with a gif image, but urllib2 can't download it.
import urllib.request as urllib2
uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q'
try:
req = urllib2.Request(uri, headers={ 'User-Agent': 'Mozilla/5.0' })
file = urllib2.urlopen(req)
except urllib2.HTTPError as err:
print('HTTP error!!!')
file = err
print(err.code)
except urllib2.URLError as err:
print('URL error!!!')
print(err.reason)
return
data = file.read(1024)
print(data)
脚本完成后,数据保持为空.为什么会发生?没有 HTTPError,我可以在浏览器控制台中看到有一个有效的 gif 并且 HTTP 响应的状态是 200 OK.谢谢.
After script finishes, data remains empty. Why does it happen? There is no HTTPError, I can see in browser console that there is a valid gif and status of HTTP responce is 200 OK. Thank you.
推荐答案
您应该检查浏览器发送到服务器的所有标头.
You should check all headers which browser sends to server.
这个页面需要两个标题:User-Agent
和 Cookie
This page needs two headers: User-Agent
and Cookie
如果您在 Chrome 或 Firefox 中使用 DevTools
,您将看到通常的浏览器(如果它还没有 cookie)收到带有 cookie 和 302 Moved Temporously
的第一个响应,它会重定向到相同的 url 但带有 cookie,然后它接收图像.
If you use DevTools
in Chrome or Firefox you will see that normally browser (if it has no cookie yet) receives first response with cookie and 302 Moved Temporarily
which redirects to the same url but with cookie and then it receives image.
你可以试试我的 cookie,也许它会收到图像.通常,您必须执行两个请求 - 第一个获取 cookie,第二个(使用 cookie)获取图像.
You can try my cookie and maybe it receives image. Bu normally you have to do two requests - first to get cookie and second (with cookie) to get image.
import urllib.request as urllib2
uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q'
headers = {
'User-Agent': 'Mozilla/5.0',
'Cookie': 'JEB2=583077046E650E2495131DE8FD2F1371',
}
try:
req = urllib2.Request(uri, headers=headers)
f = urllib2.urlopen(req)
except urllib2.HTTPError as err:
print('HTTP error!!!')
f = err
print(err.code)
except urllib2.URLError as err:
print('URL error!!!')
print(err.reason)
data = f.read(1024)
print(data)
如果您使用 requests
模块,那么它会自动完成所有操作,您将不需要两个请求.
If you use requests
module then it will do all automatically and you will no need two requests.
import requests
uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q'
headers = {
'User-Agent': 'Mozilla/5.0',
}
r = requests.get(uri, headers=headers)
print(r.content)
这篇关于urllib2.urlopen 无法获取图片,但浏览器可以的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!