请求:.text 格式的解释 [英] Requests: Explanation of the .text format

查看:20
本文介绍了请求:.text 格式的解释的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 requests 模块和 Python 2.7 来构建一个基本的网络爬虫.

I'm using the requests module along with Python 2.7 to build a basic web crawler.

source_code = requests.get(url)
plain_text = source_code.text

现在,在上面的代码行中,我将指定 URL 的源代码和其他元数据存储在 source_code 变量中.现在,在 source_code.text 中,.text 属性究竟是什么?它不是一个函数.我在文档中也找不到任何解释 .text 的起源或特性的内容.

Now, in the above lines of code, I'm storing the source code of the specified URL and other metadata inside the source_code variable. Now, in source_code.text, what exactly is the .text attribute? It is not a function. I couldn't find anything in the documentation which explains the origin or feature of .text either.

推荐答案

requests.get() 返回一个 Response 对象;它是具有 .text 属性的对象;它不是 URL 的源代码",它是一个对象,可让您访问响应的源代码(主体)以及其他信息.Response.text 属性为您提供响应的正文,解码为 unicode.

requests.get() returns a Response object; it is that object that has the .text attribute; it is not the 'source code' of the URL, it is an object that lets you access the source code (the body) of the response, as well as other information. The Response.text attribute gives you the body of the response, decoded to unicode.

请参阅响应内容部分 快速入门文档:

See the Response Content section of the Quickstart documentation:

当您发出请求时,Requests 会根据 HTTP 标头对响应的编码进行有根据的猜测.访问r.text时使用Requests猜测的文本编码.

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access r.text.

更多信息可以在 API 文档中找到,请参阅 Response.text 条目:

Further information can be found in the API documentation, see the Response.text entry:

响应内容,Unicode 格式.

Content of the response, in unicode.

如果 Response.encoding 为 None,将使用 chardet 猜测编码.

If Response.encoding is None, encoding will be guessed using chardet.

响应内容的编码完全根据 HTTP 标头确定,完全遵循 RFC 2616.如果您可以利用非 HTTP 知识更好地猜测编码,则应在访问此属性之前适当设置 r.encoding.

The encoding of the response content is determined based solely on HTTP headers, following RFC 2616 to the letter. If you can take advantage of non-HTTP knowledge to make a better guess at the encoding, you should set r.encoding appropriately before accessing this property.

您也可以使用 Response.content 以原始字节形式访问未解码的响应正文.

You can also use Response.content to access the response body undecoded, as raw bytes.

这篇关于请求:.text 格式的解释的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆