Python3:解码转换为字符串的UTF-8字节 [英] Python3: Decode UTF-8 bytes converted as string
问题描述
假设我有类似的东西:
a = "Gżegżółka"
a = bytes(a, 'utf-8')
a = str(a)
以以下形式返回字符串:
which returns string in form:
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
现在,它以简单的字符串形式发送(我从eval
函数中将其作为断言来获取).我现在怎么能正常获得起始词的UTF-8形式?如果压缩率比str(bytes(x))
好,那我会很高兴的.
Now it's send as simple string (I get it as assertion from eval
function). How the heck can I now get normal UTF-8 form of starting word? If there is some better compression than str(bytes(x))
then I would be glad to hear.
推荐答案
如果要对文本进行编码和解码,这就是 decode
方法适用于:
If you want to encode and decode text, that's what the encode
and decode
methods are for:
>>> a = "Gżegżółka"
>>> b = a.encode('utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = b.decode('utf-8')
>>> c
'Gżegżółka'
此外,请注意,UTF-8已经是默认设置,因此您可以执行以下操作:
Also, notice that UTF-8 is already the default, so you can just do this:
>>> b = a.encode()
>>> c = b.decode()
您需要指定参数的唯一原因是:
The only reason you need to specify arguments is:
- 您需要使用其他编码代替UTF-8,
- 您需要指定特定的错误处理程序,例如
'surrogatereplace'
而不是'strict'
或 - 您的代码必须在Python 3.0-3.1(几乎没有人使用)中运行.
- You need to use some other encoding instead of UTF-8,
- You need to specify a specific error handler, like
'surrogatereplace'
instead of'strict'
, or - Your code has to run in Python 3.0-3.1 (which almost nobody used).
但是,如果您确实愿意,您可以做您已经在做的事情;您只需要在str
调用中明确指定编码,就像在bytes
调用中一样:
However, if you really want to, you can do what you were already doing; you just need to explicitly specify the encoding in the str
call, just as you did in the bytes
call:
>>> a = "Gżegżółka"
>>> b = bytes(a, 'utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = str(b, 'utf-8')
>>> c
像您所做的那样,在没有编码的bytes
对象上调用str
不会对它进行解码,也不会引发像在没有编码的str
上调用bytes
那样的异常,因为str
的主要工作是为您提供对象的字符串表示形式,而bytes
对象的最佳字符串表示形式是b'…'
.
Calling str
on a bytes
object without an encoding, as you were doing, doesn't decode it, and doesn't raise an exception like calling bytes
on a str
without an encoding, because the main job of str
is to give you a string representation of the object—and the best string representation of a bytes
object is that b'…'
.
这篇关于Python3:解码转换为字符串的UTF-8字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!