如何使用urllib从网络下载图像 [英] How to use urllib to download image from web

查看:33
本文介绍了如何使用urllib从网络下载图像的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用以下代码下载图像:

from urllib import urlretrieveurlretrieve('http://gdimitriou.eu/wp-content/uploads/2008/04/google-image-search.jpg','google-image-search.jpg')

它奏效了.图像已下载,可以通过任何图像查看器软件打开.

<小时>

但是,下面的代码不起作用.下载的图片只有2KB,任何图片浏览器都无法打开.

from urllib import urlretrieveurlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg','Zindagi1976.jpg')

这是 HTML 格式的结果.

 错误所请求的网址无法检索在尝试检索 URL 时:http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg遇到以下错误:拒绝访问.访问控制配置阻止您的请求此时被允许.如果您认为这不正确,请联系您的服务提供商.您的缓存管理员是nobody.2011 年 12 月 5 日星期一 17:19:53 由 sq56.wikimedia.org (squid/2.7.STABLE9) 生成

解决方案

如果你使用过以下图片,可以下载:

wget http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg

但是如果您执行以下操作:

from urllib import urlretrieveurlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg','Zindagi1976.jpg')

您可能无法下载图像.这可能是因为维基百科可能有规则 (robot.txt) 来拒绝机器人或机器人(未知客户端).尝试模拟浏览器.

为此,您必须将以下内容添加为标题的一部分:

('用户代理','Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1)Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')

你可以这样做:

<预><代码>>>>从 urllib 导入 FancyURLopener>>>类 MyOpener(FancyURLopener):... version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'...>>>myopener = MyOpener()>>>myopener.retrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg')('Zindagi1976.jpg', )

这将检索文件

I'm trying to download an image using this code:

from urllib import urlretrieve
urlretrieve('http://gdimitriou.eu/wp-content/uploads/2008/04/google-image-search.jpg', 
            'google-image-search.jpg')

It worked. The image was downloaded and can be open by any image viewer software.


However, the code below is not working. Downloaded image is only 2KB and can't be opened by any image viewer.

from urllib import urlretrieve
urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 
            'Zindagi1976.jpg')

Here is the result in HTML format.

    ERROR

The requested URL could not be retrieved

While trying to retrieve the URL: http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg

The following error was encountered:

Access Denied.
Access control configuration prevents your request from being allowed at this time. Please contact your service provider if you feel this is incorrect.

Your cache administrator is nobody. 
Generated Mon, 05 Dec 2011 17:19:53 GMT by sq56.wikimedia.org (squid/2.7.STABLE9)

解决方案

If you used the following, you can download the image:

wget http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg

But if you did the following:

from urllib import urlretrieve
urlretrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 
            'Zindagi1976.jpg')

You may not be able to download image. This may be the case because wikipedia may have rules (robot.txt) to deny robots or bots (unknown clients). Try emulating a browser.

To do that you have to add the following as a part of header:

('User-agent', 
 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) 
 Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')

You can do something like this:

>>> from urllib import FancyURLopener
>>> class MyOpener(FancyURLopener):
...     version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
... 
>>> myopener = MyOpener()
>>> myopener.retrieve('http://upload.wikimedia.org/wikipedia/en/4/44/Zindagi1976.jpg', 'Zindagi1976.jpg')
('Zindagi1976.jpg', <httplib.HTTPMessage instance at 0x1007bfe18>)

This retrieves the file

这篇关于如何使用urllib从网络下载图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆