请求-获取内容类型/大小,而无需获取整个页面/内容 [英] Requests - get content-type/size without fetching the whole page/content

查看:84
本文介绍了请求-获取内容类型/大小,而无需获取整个页面/内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的网站搜寻器,它可以正常工作,但有时由于大量内容(例如ISO映像,.exe文件和其他较大内容)而卡住了。使用文件扩展名猜测内容类型可能不是最好的主意。

I have a simple website crawler, it works fine, but sometime it stuck because of large content such as ISO images, .exe files and other large stuff. Guessing content-type using file extension is probably not the best idea.

是否可以在不获取整个内容的情况下获取内容类型和内容的长度/大小/ page?

这是我的代码:

requests.adapters.DEFAULT_RETRIES = 2
url = url.decode('utf8', 'ignore')
urlData = urlparse.urlparse(url)
urlDomain = urlData.netloc
session = requests.Session()
customHeaders = {}
if maxRedirects == None:
    session.max_redirects = self.maxRedirects
else:
    session.max_redirects = maxRedirects
self.currentUserAgent = self.userAgents[random.randrange(len(self.userAgents))]
customHeaders['User-agent'] = self.currentUserAgent
try:
    response = session.get(url, timeout=self.pageOpenTimeout, headers=customHeaders)
    currentUrl = response.url
    currentUrlData = urlparse.urlparse(currentUrl)
    currentUrlDomain = currentUrlData.netloc
    domainWWW = 'www.' + str(urlDomain)
    headers = response.headers
    contentType = str(headers['content-type'])
except:
    logging.basicConfig(level=logging.DEBUG, filename=self.exceptionsFile)
    logging.exception("Get page exception:")
    response = None


推荐答案

是。

您可以使用 Session.head 方法创建 HEAD 请求:

You can use the Session.head method to create HEAD requests:

response = session.head(url, timeout=self.pageOpenTimeout, headers=customHeaders)
contentType = response.headers['content-type']

A HEAD 请求类似于 GET 请求,但不会发送邮件正文。

A HEAD request similar to GET request, except that the message body would not be sent.

这里是维基百科


HEAD $ b $ A响应的sks与对应于GET请求的响应相同,但没有响应主体。这对于检索写在响应头中的元信息很有用,而不必传输整个内容。

HEAD Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.

这篇关于请求-获取内容类型/大小,而无需获取整个页面/内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆