如何在Python 3中将字符串转换为unicode /字节字符串？ [英] How to convert a string to unicode/byte string in Python 3?

查看：565 发布时间：2020/10/4 19:56:42 python python-3.x unicode encode codec

本文介绍了如何在Python 3中将字符串转换为unicode /字节字符串？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我知道这可行：

a = u"\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"
print(a) # 方法，删除存储在

但是如果我有一个不以 u开头的JSON文件中的字符串（ a = \u65b9\u6cd5\uff0c\u5220\u9664\u5b58 50u50a8\u5728 ），我知道如何在Python 2中制作它（ print unicode（a，encoding ='unicode_escape'）＃打印方法，删除存储在）。但是，如何使用Python 3做到这一点呢？

But if I have a string from a JSON file which does not start with "u"(a = "\u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728"), I know how to make it in Python 2 (print unicode(a, encoding='unicode_escape') # Prints 方法，删除存储在). But how to do it with Python 3?

类似地，如果它是从文件中加载的字节字符串，则如何转换呢？

Similarly, if it's a byte string loaded from a file, how to convert it?

print("好的".encode("utf-8"))  # b'\xe5\xa5\xbd\xe7\x9a\x84'
# how to convert this?
b = '\xe5\xa5\xbd\xe7\x9a\x84'  # 好的

推荐答案

如果我正确理解，该文件将包含文字文本 \u65b9\u6cd5\uff0c\u5220\u9664＼ 5u5b58\u50a8\u5728 （所以它是纯ASCII码，但带有反斜杠，并且所有描述Unicode序号的方式都与在Python str 文字）。如果是这样，有两种方法可以解决此问题：


If I understand correctly, the file contains the literal text \u65b9\u6cd5\uff0c\u5220\u9664\u5b58\u50a8\u5728 (so it's plain ASCII, but with backslashes and all that describe the Unicode ordinals the same way you would in a Python str literal). If so, there are two ways to handle this:
 
 以二进制模式读取文件，然后调用 mystr = mybytes.decode（'unicode-escape'）从 bytes 转换为 str 解释转义
 
 以文本模式读取文件，并使用 codecs 模块进行文本->文本转换（字节现在，仅 codecs 模块功能支持字节到文本和文本到文本的编解码器；  bytes.decode 仅用于字节文本和 str.encode 纯粹是文本到字节，因为通常在Py2中， str.encode 和 unicode.decode 是一个错误，删除危险的方法可以使您更容易理解转换的方向。  decodedstr = codecs.decode（encodedstr，'unicode-escape'） 
 
 

Read the file in binary mode, then call mystr = mybytes.decode('unicode-escape') to convert from the bytes to str interpreting the escapes
Read the file in text mode, and use the codecs module for the "text -> text" conversion (bytes to bytes and text to text codecs are now supported only by the codecs module functions; bytes.decode is purely for bytes to text and str.encode is purely for text to bytes, because usually, in Py2, str.encode and unicode.decode was a mistake, and removing the dangerous methods makes it easier to understand what direction the conversions are supposed to go), e.g. decodedstr = codecs.decode(encodedstr, 'unicode-escape')


                        这篇关于如何在Python 3中将字符串转换为unicode /字节字符串？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在Python 3中将字符串转换为unicode /字节字符串？ [英] How to convert a string to unicode/byte string in Python 3?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何在Python 3中将字符串转换为unicode /字节字符串？ [英] How to convert a string to unicode/byte string in Python 3?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭