使用 python urllib 从 url 下载图像但收到 HTTP 错误 403: Forbidden [英] download image from url using python urllib but receiving HTTP Error 403: Forbidden
问题描述
我想使用 python 模块urllib.request"从 url 下载图像文件,该模块适用于某些网站(例如 mangastream.com),但不适用于另一个(mangadoom.co)接收错误HTTP 错误 403: 禁止".后一种情况可能是什么问题以及如何解决?
I want to download image file from a url using python module "urllib.request", which works for some website (e.g. mangastream.com), but does not work for another (mangadoom.co) receiving error "HTTP Error 403: Forbidden". What could be the problem for the latter case and how to fix it?
我在 OSX 上使用 python3.4.
I am using python3.4 on OSX.
import urllib.request
# does not work
img_url = 'http://mangadoom.co/wp-content/manga/5170/886/005.png'
img_filename = 'my_img.png'
urllib.request.urlretrieve(img_url, img_filename)
在错误信息的最后说:
...
HTTPError: HTTP Error 403: Forbidden
但是,它适用于其他网站
However, it works for another website
# work
img_url = 'http://img.mangastream.com/cdn/manga/51/3140/006.png'
img_filename = 'my_img.png'
urllib.request.urlretrieve(img_url, img_filename)
我已经尝试了下面帖子中的解决方案,但它们都不适用于 mangadoom.co.
I have tried the solutions from the post below, but none of them works on mangadoom.co.
这里的解决方案也不适合,因为我的情况是下载图像.urllib2.HTTPError:HTTP 错误 403:禁止
The solution here also does not fit because my case is to download image. urllib2.HTTPError: HTTP Error 403: Forbidden
也欢迎非 python 解决方案.您的建议将不胜感激.
Non-python solution is also welcome. Your suggestion will be very appreciated.
推荐答案
这个网站阻塞了 urllib 使用的 user-agent,所以你需要在你的请求中更改它.不幸的是,我不认为 urlretrieve
直接支持这一点.
This website is blocking the user-agent used by urllib, so you need to change it in your request. Unfortunately I don't think urlretrieve
supports this directly.
我建议使用漂亮的 requests
库,代码变为(来自 这里) :
I advise for the use of the beautiful requests
library, the code becomes (from here) :
import requests
import shutil
r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png', stream=True)
if r.status_code == 200:
with open("img.png", 'wb') as f:
r.raw.decode_content = True
shutil.copyfileobj(r.raw, f)
请注意,该网站似乎不禁止 requests
用户代理.但是如果需要修改很容易:
Note that it seems this website does not forbide requests
user-agent. But if need to be modified it is easy :
r = requests.get('http://mangadoom.co/wp-content/manga/5170/886/005.png',
stream=True, headers={'User-agent': 'Mozilla/5.0'})
这篇关于使用 python urllib 从 url 下载图像但收到 HTTP 错误 403: Forbidden的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!