使用Python脚本查看网页是否存在而不下载整个网页? [英] Python script to see if a web page exists without downloading the whole page?

查看:136
本文介绍了使用Python脚本查看网页是否存在而不下载整个网页?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写脚本来测试网页的存在,如果它可以不下载整个页面就进行检查,那将是很好的选择.

I'm trying to write a script to test for the existence of a web page, would be nice if it would check without downloading the whole page.

这是我的出发点,我已经看到多个示例以相同的方式使用httplib,但是,我检查的每个站点都只会返回false.

This is my jumping off point, I've seen multiple examples use httplib in the same way, however, every site I check simply returns false.

import httplib
from httplib import HTTP
from urlparse import urlparse

def checkUrl(url):
    p = urlparse(url)
    h = HTTP(p[1])
    h.putrequest('HEAD', p[2])
    h.endheaders()
    return h.getreply()[0] == httplib.OK

if __name__=="__main__":
    print checkUrl("http://www.stackoverflow.com") # True
    print checkUrl("http://stackoverflow.com/notarealpage.html") # False

有什么想法吗?

编辑

有人建议这样做,但他们的帖子被删除了..urllib2是否避免下载整个页面?

Someone suggested this, but their post was deleted.. does urllib2 avoid downloading the whole page?

import urllib2

try:
    urllib2.urlopen(some_url)
    return True
except urllib2.URLError:
    return False

推荐答案

如何操作:

import httplib
from urlparse import urlparse

def checkUrl(url):
    p = urlparse(url)
    conn = httplib.HTTPConnection(p.netloc)
    conn.request('HEAD', p.path)
    resp = conn.getresponse()
    return resp.status < 400

if __name__ == '__main__':
    print checkUrl('http://www.stackoverflow.com') # True
    print checkUrl('http://stackoverflow.com/notarealpage.html') # False

这将发送HTTP HEAD请求,如果响应状态代码为< ;,则返回True. 400.

this will send an HTTP HEAD request and return True if the response status code is < 400.

  • 请注意,StackOverflow的根路径返回重定向(301),而不是200 OK.

这篇关于使用Python脚本查看网页是否存在而不下载整个网页?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆