python request.get()返回不正确的解码文本,而不是UTF-8吗? [英] python requests.get() returns improperly decoded text instead of UTF-8?

查看:730
本文介绍了python request.get()返回不正确的解码文本,而不是UTF-8吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当服务器的content-type'Content-Type:text/html'时,requests.get()返回不正确的编码数据.

When the content-type of the server is 'Content-Type:text/html', requests.get() returns improperly encoded data.

但是,如果我们将内容类型明确地指定为'Content-Type:text/html; charset=utf-8',它将返回正确编码的数据.

However, if we have the content type explicitly as 'Content-Type:text/html; charset=utf-8', it returns properly encoded data.

此外,当我们使用urllib.urlopen()时,它会返回正确编码的数据.

Also, when we use urllib.urlopen(), it returns properly encoded data.

以前有人注意到吗?为什么requests.get()这样表现?

Has anyone noticed this before? Why does requests.get() behave like this?

推荐答案

来自请求文档:

发出请求时,请求会根据HTTP标头对响应的编码进行有根据的猜测.访问r.text时,将使用Requests猜测的文本编码.您可以使用r.encoding属性来找出请求"正在使用的编码,并对其进行更改.

When you make a request, Requests makes educated guesses about the encoding of the response based on the HTTP headers. The text encoding guessed by Requests is used when you access r.text. You can find out what encoding Requests is using, and change it, using the r.encoding property.

>>> r.encoding
'utf-8'
>>> r.encoding = 'ISO-8859-1'

检查用于页面的编码请求,如果不是正确的编码请求,请尝试使其成为您需要的编码请求.

Check the encoding requests used for your page, and if it's not the right one - try to force it to be the one you need.

关于requestsurllib.urlopen之间的区别-他们可能使用不同的方式来猜测编码.就是这样.

Regarding the differences between requests and urllib.urlopen - they probably use different ways to guess the encoding. Thats all.

这篇关于python request.get()返回不正确的解码文本,而不是UTF-8吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆