何时在 python 中使用 unicode(string) 和 string.encode('utf-8') [英] When to use unicode(string) and string.encode('utf-8') in python

查看：25 发布时间：2021/9/6 19:12:59 python text unicode

本文介绍了何时在 python 中使用 unicode(string) 和 string.encode('utf-8')的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

电子表格单元格数据中出现了一些奇怪的字符，我尝试按照建议使用 encode('utf-8') 解决它.它没有解决问题，但是当我使用 unicode(string) 时它起作用了.我的问题是有处理所有类型文本数据的标准方法吗?

I had some odd characters coming through with spreadsheet cell data, I tried to resolve it with encode('utf-8') as was suggested. It didn't resolve the problem but when I used unicode(string) it worked. My question is there a standard way to deal with all types of text data?

推荐答案

从根本上说，一个字符串"(python2 中的unicode 字符串"，python3 中只是字符串")是一个字符"序列.但是字符"是一种抽象，无法将字符存储在文件系统中或通过网络发送(听起来很奇怪，但实际上没有).文件系统、网络、控制台和其他设备只理解字节".因此，当您与设备或外部程序交谈时，您作为程序员的工作是将字符正确转换为字节，反之亦然.

To put it very basically, a "string" ("unicode string" in python2 and just "string" in python3) is a sequence of "characters". But "character" is an abstraction, there's no way store a character in a file system or send it over network (sounds weird, but there really isn't). File systems, networks, consoles and other devices only understand "bytes". Therefore, it's your job as a programmer to correctly translate characters to bytes and vice versa when you talk to a device or an external program.

字符到字节的转换在 Python 中称为encode()".当您向设备发送字符串时，您将字符编码()"为字节:

Chars-to-bytes translation is called "encode()" in python. When you send a string to a device, you "encode()" your characters to bytes:

some_chunk_of_bytes = some_string.encode(how_exactly)

有很多方法(称为字符编码")将字符表示为字节的组合，因此您必须解释编码器您希望它如何完成.

There are many ways (called "character encodings") to represent a character as a combination of bytes, therefore you have to explain the encoder how exactly you want it to be done.

当你从某个地方读取数据时，你只能得到原始字节并且必须将它们解码()"成有意义的字符:

When you read the data from somewhere, you only get raw bytes and have to "decode()" them to meaningful characters:

some_string = some_chunk_of_bytes.decode(how_exactly)

同样，您必须指定您认为这些字节是如何编码的(无法确定).

Again, you have to specify how you think these bytes are encoded (there's no way to tell for sure).

python 中有许多快捷方式可以对您隐藏这些编码/解码内容.例如，

There are a number of shortcuts in python that hide this encode/decode stuff from you. For example,

 string = unicode(bytes)

在幕后这样做:

 string = bytes.decode(default-encoding)

当你做一些像

print string

实际上是:

sys.stdout.write(string.encode(default-encoding))

但即使您不明确使用encode/decode，您也必须意识到它仍然必须在某个时刻发生.如果你的程序出现乱码，那总是因为你:

But even if you don't use encode/decode explicitly, you have to realize it still must take place at some point. If you get garbled characters in your program, it's always because you:

忘记了编码"步骤，或者
忘记了解码"步骤，或者
提供了不正确的编码"

如上所述，这个描述非常基础，如果你想了解所有细节，请阅读

As said, this description is very basic, if you want to understand all the details, please read

这篇关于何时在 python 中使用 unicode(string) 和 string.encode('utf-8')的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

何时在 python 中使用 unicode(string) 和 string.encode('utf-8') [英] When to use unicode(string) and string.encode('utf-8') in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

何时在 python 中使用 unicode(string) 和 string.encode('utf-8') [英] When to use unicode(string) and string.encode(&#39;utf-8&#39;) in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

何时在 python 中使用 unicode(string) 和 string.encode('utf-8') [英] When to use unicode(string) and string.encode('utf-8') in python

登录关闭