Python urllib.urlretrieve()和wget之间的区别 [英] Difference between Python urllib.urlretrieve() and wget

查看:144
本文介绍了Python urllib.urlretrieve()和wget之间的区别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用Python检索一个500mb的文件,而且我有一个使用 urllib.urlretrieve()的脚本。在我和下载网站之间似乎有一些网络问题,因为这个呼叫一直挂起并且无法完成。但是,使用 wget 来检索文件往往没有任何问题。 urlretrieve() wget 之间有什么区别?可能会导致这种差异?

解决方案

答案很简单。 Python的 urllib urllib2 不可能接近成熟和健壮。在我的经验中,甚至比wget更好,是 cURL 。我编写的代码通过HTTP下载千兆字节的文件,文件大小从50 KB到超过2 GB。据我所知,cURL是现在这个任务的地球上最可靠的软件之一。我不认为python,wget,甚至大多数网络浏览器可以在正确性和实现性方面与之匹配。在一个现代化的python使用urllib2在正确的方式,它可以做得相当可靠,但我仍然运行一个卷曲子过程,这是绝对坚实的。



另一个说出这一点的方法是,cURL只做一件事,它比任何其他软件都好,因为它有更多的开发和改进。 Python的 urllib2 是可维护和方便的并且对于小到平均工作负载工作得很好,但cURL在可靠性方面处于领先地位。



此外,cURL还有许多选项来调整可靠性行为,包括重试次数,超时值等。


I am trying to retrieve a 500mb file using Python, and I have a script which uses urllib.urlretrieve(). There seems to some network problem between me and the download site, as this call consistently hangs and fails to complete. However, using wget to retrieve the file tends to work without problems. What is the difference between urlretrieve() and wget that could cause this difference?

解决方案

The answer is quite simple. Python's urllib and urllib2 are nowhere near as mature and robust as they could be. Even better than wget in my experience is cURL. I've written code that downloads gigabytes of files over HTTP with file sizes ranging from 50 KB to over 2 GB. To my knowledge, cURL is the most reliable piece of software on the planet right now for this task. I don't think python, wget, or even most web browsers can match it in terms of correctness and robustness of implementation. On a modern enough python using urllib2 in the exact right way, it can be made pretty reliable, but I still run a curl subprocess and that is absolutely rock solid.

Another way to state this is that cURL does one thing only and it does it better than any other software because it has had much more development and refinement. Python's urllib2 is serviceable and convenient and works well enough for small to average workloads, but cURL is way ahead in terms of reliability.

Also, cURL has numerous options to tune the reliability behavior including retry counts, timeout values, etc.

这篇关于Python urllib.urlretrieve()和wget之间的区别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆