使用Python请求获取HEAD内容 [英] Getting HEAD content with Python Requests
问题描述
I'm trying to parse the result of a HEAD request done using the Python Requests library, but can't seem to access the response content.
根据 docs ,我应该能够访问内容来自requests.Response.text.对于GET请求,这对我来说效果很好,但对HEAD请求,则返回None.
According to the docs, I should be able to access the content from requests.Response.text. This works fine for me on GET requests, but returns None on HEAD requests.
获取请求(有效)
import requests
response = requests.get(url)
content = response.text
content = <html>...</html>
content = <html>...</html>
HEAD请求(无内容)
import requests
response = requests.head(url)
content = response.text
content = None
content = None
编辑
好的,我很快就从答案中意识到,HEAD请求不应返回仅内容标头.但这是否意味着要访问在页面的<head>
标记中找到的内容(如<link>
和<meta>
标记),必须获取整个文档?
OK I've quickly realized form the answers that the HEAD request is not supposed to return content- only headers. But does that mean that, to access things found IN the <head>
tag of a page, like <link>
and <meta>
tags, that one must GET the whole document?
推荐答案
通过定义,对HEAD请求的响应不包含消息正文.
By definition, the responses to HEAD requests do not contain a message-body.
如果希望获得响应正文,则发送GET请求.发送HEAD请求 iff 您仅对响应状态代码和标头感兴趣.
Send a GET request if you want to, well, get a response body. Send a HEAD request iff you are only interested in the response status code and headers.
HTTP传输任意内容; HTTP术语 header 与HTML <head>
完全无关.但是,建议使用HTTP仅下载文档的一部分.如果您知道HTML <head>
代码的长度(或其上限),则可以包含
HTTP transfers arbitrary content; the HTTP term header is completely unrelated to an HTML <head>
. However, HTTP can be advised to download only a part of the document. If you know the length of the HTML <head>
code (or an upper boundary therefor), you can include an HTTP Range header in your request that advises the remote server to only return a certain number of bytes. If the remote server supports HTTP ranges, it will then serve the reduced answer.
这篇关于使用Python请求获取HEAD内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!