python-requests：获取响应内容的头部而不消耗所有内容 [英] python-requests: fetching the head of the response content without consuming it all

查看：610 发布时间：2018/7/9 16:11:05 python http unicode python-requests

本文介绍了python-requests：获取响应内容的头部而不消耗所有内容的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

使用python-requests和python-magic，我想测试web资源的mime类型而不获取其所有内容（特别是如果这个资源恰好是例如ogg文件或PDF文件）。根据结果，我可能决定全部取出它。但是，在测试mime-type之后调用text方法只返回尚未消耗的内容。如何在不消费响应内容的情况下测试mime类型？

Using python-requests and python-magic, I would like to test the mime-type of a web resource without fetching all its content (especially if this resource happens to be eg. an ogg file or a PDF file). Based on the result, I might decide to fetch it all. However calling the text method after having tested the mime-type only returns what hasn't been consumed yet. How could I test the mime-type without consuming the response content?

以下是我当前的代码。

import requests
import magic


r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)

if mime == "text/html":
    print(r.text)  # I'd like r.text to give me the entire response content

谢谢！

推荐答案

注意：在提出这个问题的时候，只获取标题流的正确方法是使用 prefetch = False 。此选项已重命名为 stream ，并且布尔值被反转，因此您需要 stream = True 。

Note: at the time this question was asked, the correct method to fetch only headers stream the body was to use prefetch=False. That option has since been renamed to stream and the boolean value is inverted, so you want stream=True.

原始答案如下。

一旦你使用 iter_content（），你必须继续使用它; .text 间接使用相同的界面（通过 .content ）。

Once you use iter_content(), you have to continue using it; .text indirectly uses the same interface under the hood (via .content).

换句话说，通过使用 iter_content（），你必须完成工作 .text 手工完成：

In other words, by using iter_content() at all, you have to do the work .text does by hand:

from requests.compat import chardet

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":
    contents = peek + b''.join(r.iter_content(10 * 1024))
    encoding = r.encoding
    if encoding is None:
        # detect encoding
        encoding = chardet.detect(contents)['encoding']
    try:
        textcontent = str(contents, encoding, errors='replace')
    except (LookupError, TypeError):
        textcontent = str(contents, errors='replace')
    print(textcontent)

假设哟你使用Python 3。

presuming you use Python 3.

替代方案是提出2个请求：

The alternative is to make 2 requests:

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
mime = magic.from_buffer(r.iter_content(256).next(), mime=True)

if mime == "text/html":
     print(r.requests.get("http://www.december.com/html/demo/hello.html").text)

Python 2版本：

Python 2 version:

r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False)
peek = r.iter_content(256).next()
mime = magic.from_buffer(peek, mime=True)

if mime == "text/html":
    contents = peek + ''.join(r.iter_content(10 * 1024))
    encoding = r.encoding
    if encoding is None:
        # detect encoding
        encoding = chardet.detect(contents)['encoding']
    try:
        textcontent = unicode(contents, encoding, errors='replace')
    except (LookupError, TypeError):
        textcontent = unicode(contents, errors='replace')
    print(textcontent)

这篇关于python-requests：获取响应内容的头部而不消耗所有内容的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

python-requests：获取响应内容的头部而不消耗所有内容 [英] python-requests: fetching the head of the response content without consuming it all

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

python-requests：获取响应内容的头部而不消耗所有内容 [英] python-requests: fetching the head of the response content without consuming it all

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭