在不下载网页的情况下使用Python检查链接是否已死 [英] Checking whether a link is dead or not using Python without downloading the webpage
问题描述
对于那些了解 wget
的人来说,它有一个选项 --spider
,它允许人们检查链接是否损坏,而无需实际下载网页.我想在 Python 中做同样的事情.我的问题是我有一个包含 100'000 个链接的列表,我想每天最多检查一次,每周至少检查一次.无论如何,这都会产生大量不必要的流量.
For those who know wget
, it has a option --spider
, which allows one to check whether a link is broke or not, without actually downloading the webpage. I would like to do the same thing in Python. My problem is that I have a list of 100'000 links I want to check, at most once a day, and at least once a week. In any case this will generate a lot of unnecessary traffic.
据我了解 urllib2.urlopen()
文档,它不下载页面而只下载元信息.这样对吗?或者有没有其他方法可以很好地做到这一点?
As far as I understand from the urllib2.urlopen()
documentation, it does not download the page but only the meta-information. Is this correct? Or is there some other way to do this in a nice manner?
最好的,
特洛伊尔
Best,
Troels
推荐答案
你应该使用 HEAD为此请求,它向网络服务器询问没有正文的标题.请参阅 如何在 Python 2 中发送 HEAD HTTP 请求?
You should use the HEAD Request for this, it asks the webserver for the headers without the body. See How do you send a HEAD HTTP request in Python 2?
这篇关于在不下载网页的情况下使用Python检查链接是否已死的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!