使用Python脚本查看网页是否存在而不下载整个网页? [英] Python script to see if a web page exists without downloading the whole page?
本文介绍了使用Python脚本查看网页是否存在而不下载整个网页?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试编写脚本来测试网页的存在,如果它可以不下载整个页面就进行检查,那将是很好的选择.
I'm trying to write a script to test for the existence of a web page, would be nice if it would check without downloading the whole page.
这是我的出发点,我已经看到多个示例以相同的方式使用httplib,但是,我检查的每个站点都只会返回false.
This is my jumping off point, I've seen multiple examples use httplib in the same way, however, every site I check simply returns false.
import httplib
from httplib import HTTP
from urlparse import urlparse
def checkUrl(url):
p = urlparse(url)
h = HTTP(p[1])
h.putrequest('HEAD', p[2])
h.endheaders()
return h.getreply()[0] == httplib.OK
if __name__=="__main__":
print checkUrl("http://www.stackoverflow.com") # True
print checkUrl("http://stackoverflow.com/notarealpage.html") # False
有什么想法吗?
编辑
有人建议这样做,但他们的帖子被删除了..urllib2是否避免下载整个页面?
Someone suggested this, but their post was deleted.. does urllib2 avoid downloading the whole page?
import urllib2
try:
urllib2.urlopen(some_url)
return True
except urllib2.URLError:
return False
推荐答案
如何操作:
import httplib
from urlparse import urlparse
def checkUrl(url):
p = urlparse(url)
conn = httplib.HTTPConnection(p.netloc)
conn.request('HEAD', p.path)
resp = conn.getresponse()
return resp.status < 400
if __name__ == '__main__':
print checkUrl('http://www.stackoverflow.com') # True
print checkUrl('http://stackoverflow.com/notarealpage.html') # False
这将发送HTTP HEAD请求,如果响应状态代码为< ;,则返回True. 400.
this will send an HTTP HEAD request and return True if the response status code is < 400.
- 请注意,StackOverflow的根路径返回重定向(301),而不是200 OK.
这篇关于使用Python脚本查看网页是否存在而不下载整个网页?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文