如何使用urllib从网络下载图像 [英] How to use urllib to download image from web
问题描述
我正在尝试使用以下代码下载图像:
from urllib import urlretrieveurlretrieve('http://gdimitriou.eu/wp-content/uploads/2008/04/google-image-search.jpg','google-image-search.jpg')
它奏效了.图像已下载,可以通过任何图像查看器软件打开.
<小时>但是,下面的代码不起作用.下载的图片只有2KB,任何图片浏览器都无法打开.
from urllib import urlretrieveurlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg','Zindagi1976.jpg')
这是 HTML 格式的结果.
错误所请求的网址无法检索在尝试检索 URL 时:http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg遇到以下错误:拒绝访问.访问控制配置阻止您的请求此时被允许.如果您认为这不正确,请联系您的服务提供商.您的缓存管理员是nobody.2011 年 12 月 5 日星期一 17:19:53 由 sq56.wikimedia.org (squid/2.7.STABLE9) 生成
如果你使用过以下图片,可以下载:
wget http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg
但是如果您执行以下操作:
from urllib import urlretrieveurlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg','Zindagi1976.jpg')
您可能无法下载图像.这可能是因为维基百科可能有规则 (robot.txt) 来拒绝机器人或机器人(未知客户端).尝试模拟浏览器.
为此,您必须将以下内容添加为标题的一部分:
('用户代理','Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1)Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')
你可以这样做:
<预><代码>>>>从 urllib 导入 FancyURLopener>>>类 MyOpener(FancyURLopener):... version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'...>>>myopener = MyOpener()>>>myopener.retrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg')('Zindagi1976.jpg',这将检索文件
I'm trying to download an image using this code:
from urllib import urlretrieve
urlretrieve('http://gdimitriou.eu/wp-content/uploads/2008/04/google-image-search.jpg',
'google-image-search.jpg')
It worked. The image was downloaded and can be open by any image viewer software.
However, the code below is not working. Downloaded image is only 2KB and can't be opened by any image viewer.
from urllib import urlretrieve
urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg',
'Zindagi1976.jpg')
Here is the result in HTML format.
ERROR
The requested URL could not be retrieved
While trying to retrieve the URL: http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg
The following error was encountered:
Access Denied.
Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect.
Your cache administrator is nobody.
Generated Mon, 05 Dec 2011 17:19:53 GMT by sq56.wikimedia.org (squid/2.7.STABLE9)
If you used the following, you can download the image:
wget http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg
But if you did the following:
from urllib import urlretrieve
urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg',
'Zindagi1976.jpg')
You may not be able to download image. This may be the case because wikipedia may have rules (robot.txt) to deny robots or bots (unknown clients). Try emulating a browser.
To do that you have to add the following as a part of header:
('User-agent',
'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1)
Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')
You can do something like this:
>>> from urllib import FancyURLopener
>>> class MyOpener(FancyURLopener):
... version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
...
>>> myopener = MyOpener()
>>> myopener.retrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg')
('Zindagi1976.jpg', <httplib.HTTPMessage instance at 0x1007bfe18>)
This retrieves the file
这篇关于如何使用urllib从网络下载图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!