python-requests:获取响应内容的头部而不消耗所有内容 [英] python-requests: fetching the head of the response content without consuming it all
问题描述
使用python-requests和python-magic,我想测试web资源的mime类型而不获取其所有内容(特别是如果这个资源恰好是例如ogg文件或PDF文件)。根据结果,我可能决定全部取出它。但是,在测试mime-type之后调用text方法只返回尚未消耗的内容。如何在不消费响应内容的情况下测试mime类型?
Using python-requests and python-magic, I would like to test the mime-type of a web resource without fetching all its content (especially if this resource happens to be eg. an ogg file or a PDF file). Based on the result, I might decide to fetch it all. However calling the text method after having tested the mime-type only returns what hasn't been consumed yet. How could I test the mime-type without consuming the response content?
以下是我当前的代码。
import requests
import magic
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)
if mime == "text/html":
print(r.text) # I'd like r.text to give me the entire response content
谢谢!
推荐答案
注意:在提出这个问题的时候,只获取标题流的正确方法是使用 prefetch = False
。此选项已重命名为 stream
,并且布尔值被反转,因此您需要 stream = True
。
Note: at the time this question was asked, the correct method to fetch only headers stream the body was to use prefetch=False
. That option has since been renamed to stream
and the boolean value is inverted, so you want stream=True
.
原始答案如下。
一旦你使用 iter_content()
,你必须继续使用它; .text
间接使用相同的界面(通过 .content
)。
Once you use iter_content()
, you have to continue using it; .text
indirectly uses the same interface under the hood (via .content
).
换句话说,通过使用 iter_content()
,你必须完成工作 .text
手工完成:
In other words, by using iter_content()
at all, you have to do the work .text
does by hand:
from requests.compat import chardet
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)
if mime == "text/html":
contents = peek + b''.join(r.iter_content(10 * 1024))
encoding = r.encoding
if encoding is None:
# detect encoding
encoding = chardet.detect(contents)['encoding']
try:
textcontent = str(contents, encoding, errors='replace')
except (LookupError, TypeError):
textcontent = str(contents, errors='replace')
print(textcontent)
假设哟你使用Python 3。
presuming you use Python 3.
替代方案是提出2个请求:
The alternative is to make 2 requests:
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)
if mime == "text/html":
print(r.requests.get("http://www.december.com/html/demo/hello.html").text)
Python 2版本:
Python 2 version:
r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)
if mime == "text/html":
contents = peek + ''.join(r.iter_content(10 * 1024))
encoding = r.encoding
if encoding is None:
# detect encoding
encoding = chardet.detect(contents)['encoding']
try:
textcontent = unicode(contents, encoding, errors='replace')
except (LookupError, TypeError):
textcontent = unicode(contents, errors='replace')
print(textcontent)
这篇关于python-requests:获取响应内容的头部而不消耗所有内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!