Python3:解码转换为字符串的UTF-8字节 [英] Python3: Decode UTF-8 bytes converted as string

查看:541
本文介绍了Python3:解码转换为字符串的UTF-8字节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有类似的东西:

a = "Gżegżółka"
a = bytes(a, 'utf-8')
a = str(a)

以以下形式返回字符串:

which returns string in form:

b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'

现在,它以简单的字符串形式发送(我从eval函数中将其作为断言来获取).我现在怎么能正常获得起始词的UTF-8形式?如果压缩率比str(bytes(x))好,那我会很高兴的.

Now it's send as simple string (I get it as assertion from eval function). How the heck can I now get normal UTF-8 form of starting word? If there is some better compression than str(bytes(x)) then I would be glad to hear.

推荐答案

如果要对文本进行编码和解码,这就是

If you want to encode and decode text, that's what the encode and decode methods are for:

>>> a = "Gżegżółka"
>>> b = a.encode('utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = b.decode('utf-8')
>>> c
'Gżegżółka'

此外,请注意,UTF-8已经是默认设置,因此您可以执行以下操作:

Also, notice that UTF-8 is already the default, so you can just do this:

>>> b = a.encode()
>>> c = b.decode()

您需要指定参数的唯一原因是:

The only reason you need to specify arguments is:

  • 您需要使用其他编码代替UTF-8,
  • 您需要指定特定的错误处理程序,例如'surrogatereplace'而不是'strict'
  • 您的代码必须在Python 3.0-3.1(几乎没有人使用)中运行.
  • You need to use some other encoding instead of UTF-8,
  • You need to specify a specific error handler, like 'surrogatereplace' instead of 'strict', or
  • Your code has to run in Python 3.0-3.1 (which almost nobody used).

但是,如果您确实愿意,您可以做您已经在做的事情;您只需要在str调用中明确指定编码,就像在bytes调用中一样:

However, if you really want to, you can do what you were already doing; you just need to explicitly specify the encoding in the str call, just as you did in the bytes call:

>>> a = "Gżegżółka"
>>> b = bytes(a, 'utf-8')
>>> b
b'G\xc5\xbceg\xc5\xbc\xc3\xb3\xc5\x82ka'
>>> c = str(b, 'utf-8')
>>> c

像您所做的那样,在没有编码的bytes对象上调用str不会对它进行解码,也不会引发像在没有编码的str上调用bytes那样的异常,因为str的主要工作是为您提供对象的字符串表示形式,而bytes对象的最佳字符串表示形式是b'…'.

Calling str on a bytes object without an encoding, as you were doing, doesn't decode it, and doesn't raise an exception like calling bytes on a str without an encoding, because the main job of str is to give you a string representation of the object—and the best string representation of a bytes object is that b'…'.

这篇关于Python3:解码转换为字符串的UTF-8字节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆