python requests.get() 返回未正确解码的文本而不是 UTF-8? [英] python requests.get() returns improperly decoded text instead of UTF-8?

查看:66
本文介绍了python requests.get() 返回未正确解码的文本而不是 UTF-8?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当服务器的content-type'Content-Type:text/html'时,requests.get()返回错误编码数据.

但是,如果我们将内容类型显式设置为 'Content-Type:text/html;charset=utf-8',它返回正确编码的数据.

此外,当我们使用 urllib.urlopen() 时,它返回正确编码的数据.

有没有人注意到这一点?为什么 requests.get() 会这样?

解决方案

来自 请求文档:

<块引用>

当您发出请求时,Requests 会根据 HTTP 标头对响应的编码进行有根据的猜测.访问r.text时使用Requests猜测的文本编码.您可以找出请求使用的编码,并使用 r.encoding 属性对其进行更改.

<预><代码>>>>r.编码'utf-8'>>>r.encoding = 'ISO-8859-1'

检查用于您的页面的编码请求,如果它不正确 - 尝试强制它成为您需要的编码请求.

关于 requestsurllib.urlopen 之间的区别 - 他们可能使用不同的方式来猜测编码.仅此而已.

When the content-type of the server is 'Content-Type:text/html', requests.get() returns improperly encoded data.

However, if we have the content type explicitly as 'Content-Type:text/html; charset=utf-8', it returns properly encoded data.

Also, when we use urllib.urlopen(), it returns properly encoded data.

Has anyone noticed this before? Why does requests.get() behave like this?

解决方案

From requests documentation:

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access r.text. You can find out what encoding Requests is using, and change it, using the r.encoding property.

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

Check the encoding requests used for your page, and if it's not the right one - try to force it to be the one you need.

Regarding the differences between requests and urllib.urlopen - they probably use different ways to guess the encoding. Thats all.

这篇关于python requests.get() 返回未正确解码的文本而不是 UTF-8?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆