如何知道 urllib.urlretrieve 是否成功? [英] How to know if urllib.urlretrieve succeeds?
问题描述
urllib.urlretrieve
即使远程 http 服务器上不存在该文件,它也会静默返回,它只是将 html 页面保存到指定文件中.例如:
urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg')
只是静默返回,即使 google.com 服务器上不存在 abc.jpg,生成的 abc.jpg
也不是有效的 jpg 文件,它实际上是一个 html 页面.我想返回的标头(httplib.HTTPMessage 实例)可用于实际判断检索是否成功,但我找不到 httplib.HTTPMessage
的任何文档.
谁能提供一些有关此问题的信息?
如果可能,请考虑使用 urllib2
.它比 urllib
更先进,更易于使用.
您可以轻松检测任何 HTTP 错误:
<预><代码>>>>导入 urllib2>>>resp = urllib2.urlopen("http://google.com/abc.jpg")回溯(最近一次调用最后一次):<<多行跳过>>urllib2.HTTPError:HTTP 错误 404:未找到resp
实际上是 HTTPResponse
对象,你可以用它做很多有用的事情:
urllib.urlretrieve
returns silently even if the file doesn't exist on the remote http server, it just saves a html page to the named file. For example:
urllib.urlretrieve('http://google.com/abc.jpg', 'abc.jpg')
just returns silently, even if abc.jpg doesn't exist on google.com server, the generated abc.jpg
is not a valid jpg file, it's actually a html page . I guess the returned headers (a httplib.HTTPMessage instance) can be used to actually tell whether the retrieval successes or not, but I can't find any doc for httplib.HTTPMessage
.
Can anybody provide some information about this problem?
Consider using urllib2
if it possible in your case. It is more advanced and easy to use than urllib
.
You can detect any HTTP errors easily:
>>> import urllib2
>>> resp = urllib2.urlopen("http://google.com/abc.jpg")
Traceback (most recent call last):
<<MANY LINES SKIPPED>>
urllib2.HTTPError: HTTP Error 404: Not Found
resp
is actually HTTPResponse
object that you can do a lot of useful things with:
>>> resp = urllib2.urlopen("http://google.com/")
>>> resp.code
200
>>> resp.headers["content-type"]
'text/html; charset=windows-1251'
>>> resp.read()
"<<ACTUAL HTML>>"
这篇关于如何知道 urllib.urlretrieve 是否成功?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!