如何知道urllib.urlretrieve是否成功? [英] How to know if urllib.urlretrieve succeeds?

查看:425
本文介绍了如何知道urllib.urlretrieve是否成功?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

urllib.urlretrieve也会以静默方式返回,它只是将html页面保存到命名文件中.例如:

urllib.urlretrieve returns silently even if the file doesn't exist on the remote http server, it just saves a html page to the named file. For example:

urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg')

只是默默地返回,即使google.com服务器上不存在abc.jpg,生成的abc.jpg也不是有效的jpg文件,它实际上是html页面.我想返回的标头(一个httplib.HTTPMessage实例)可以用来实际上告诉检索是否成功,但是我找不到httplib.HTTPMessage的任何文档.

just returns silently, even if abc.jpg doesn't exist on google.com server, the generated abc.jpg is not a valid jpg file, it's actually a html page . I guess the returned headers (a httplib.HTTPMessage instance) can be used to actually tell whether the retrieval successes or not, but I can't find any doc for httplib.HTTPMessage.

有人可以提供有关此问题的一些信息吗?

Can anybody provide some information about this problem?

推荐答案

在可能的情况下,请考虑使用urllib2.它比urllib更先进,更易于使用.

Consider using urllib2 if it possible in your case. It is more advanced and easy to use than urllib.

您可以轻松检测到任何HTTP错误:

You can detect any HTTP errors easily:

>>> import urllib2
>>> resp = urllib2.urlopen("http://google.com/abc.jpg")
Traceback (most recent call last):
<<MANY LINES SKIPPED>>
urllib2.HTTPError: HTTP Error 404: Not Found

resp实际上是HTTPResponse对象,您可以使用以下方法做很多有用的事情:

resp is actually HTTPResponse object that you can do a lot of useful things with:

>>> resp = urllib2.urlopen("http://google.com/")
>>> resp.code
200
>>> resp.headers["content-type"]
'text/html; charset=windows-1251'
>>> resp.read()
"<<ACTUAL HTML>>"

这篇关于如何知道urllib.urlretrieve是否成功?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆