在不下载网页的情况下使用Python检查链接是否已死 [英] Checking whether a link is dead or not using Python without downloading the webpage

查看:23
本文介绍了在不下载网页的情况下使用Python检查链接是否已死的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于那些了解 wget 的人来说,它有一个选项 --spider,它允许人们检查链接是否损坏,而无需实际下载网页.我想在 Python 中做同样的事情.我的问题是我有一个包含 100'000 个链接的列表,我想每天最多检查一次,每周至少检查一次.无论如何,这都会产生大量不必要的流量.

For those who know wget, it has a option --spider, which allows one to check whether a link is broke or not, without actually downloading the webpage. I would like to do the same thing in Python. My problem is that I have a list of 100'000 links I want to check, at most once a day, and at least once a week. In any case this will generate a lot of unnecessary traffic.

据我了解 urllib2.urlopen() 文档,它不下载页面而只下载元信息.这样对吗?或者有没有其他方法可以很好地做到这一点?

As far as I understand from the urllib2.urlopen() documentation, it does not download the page but only the meta-information. Is this correct? Or is there some other way to do this in a nice manner?

最好的,
特洛伊尔

Best,
Troels

推荐答案

你应该使用 HEAD为此请求,它向网络服务器询问没有正文的标题.请参阅 如何在 Python 2 中发送 HEAD HTTP 请求?

You should use the HEAD Request for this, it asks the webserver for the headers without the body. See How do you send a HEAD HTTP request in Python 2?

这篇关于在不下载网页的情况下使用Python检查链接是否已死的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆