urllib2.urlopen 无法获取图片,但浏览器可以 [英] urllib2.urlopen cannot get image, but browser can

查看:48
本文介绍了urllib2.urlopen 无法获取图片,但浏览器可以的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有一个带有gif图片的链接,但urllib2无法下载.

There is a link with a gif image, but urllib2 can't download it.

import urllib.request as urllib2
uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q'
try:
  req = urllib2.Request(uri, headers={ 'User-Agent': 'Mozilla/5.0' })
  file = urllib2.urlopen(req)
except urllib2.HTTPError as err:
  print('HTTP error!!!')
  file = err 
  print(err.code)
except urllib2.URLError as err:
  print('URL error!!!')
  print(err.reason)
  return 

data = file.read(1024)
print(data)

脚本完成后,数据保持为空.为什么会发生?没有 HTTPError,我可以在浏览器控制台中看到有一个有效的 gif 并且 HTTP 响应的状态是 200 OK.谢谢.

After script finishes, data remains empty. Why does it happen? There is no HTTPError, I can see in browser console that there is a valid gif and status of HTTP responce is 200 OK. Thank you.

推荐答案

您应该检查浏览器发送到服务器的所有标头.

You should check all headers which browser sends to server.

这个页面需要两个标题:User-AgentCookie

This page needs two headers: User-Agent and Cookie

如果您在 Chrome 或 Firefox 中使用 DevTools,您将看到通常的浏览器(如果它还没有 cookie)收到带有 cookie 和 302 Moved Temporously 的第一个响应,它会重定向到相同的 url 但带有 cookie,然后它接收图像.

If you use DevTools in Chrome or Firefox you will see that normally browser (if it has no cookie yet) receives first response with cookie and 302 Moved Temporarily which redirects to the same url but with cookie and then it receives image.

你可以试试我的 cookie,也许它会收到图像.通常,您必须执行两个请求 - 第一个获取 cookie,第二个(使用 cookie)获取图像.

You can try my cookie and maybe it receives image. Bu normally you have to do two requests - first to get cookie and second (with cookie) to get image.

import urllib.request as urllib2

uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q'

headers = {
    'User-Agent': 'Mozilla/5.0',
    'Cookie': 'JEB2=583077046E650E2495131DE8FD2F1371',
}

try:
  req = urllib2.Request(uri, headers=headers)
  f = urllib2.urlopen(req)
except urllib2.HTTPError as err:
  print('HTTP error!!!')
  f = err 
  print(err.code)
except urllib2.URLError as err:
  print('URL error!!!')
  print(err.reason)

data = f.read(1024)
print(data)

如果您使用 requests 模块,那么它会自动完成所有操作,您将不需要两个请求.

If you use requests module then it will do all automatically and you will no need two requests.

import requests

uri = 'http://ums.adtechjp.com/mapuser?providerid=1074;userid=AapfqIzytwl7ks8AA_qiU_BNUs8AAAFYqnZh4Q'

headers = {
    'User-Agent': 'Mozilla/5.0',
}

r = requests.get(uri, headers=headers)

print(r.content)

这篇关于urllib2.urlopen 无法获取图片,但浏览器可以的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆