encode / decode之间有什么区别? [英] What is the difference between encode/decode?

查看:1401
本文介绍了encode / decode之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从来没有确定我理解了str / unicode decode和encode之间的区别。

I've never been sure that I understand the difference between str/unicode decode and encode.

我知道 str()。 decode()用于当你有一个字符串的字符串,你知道有一定的字符编码,给定编码名称,它将返回一个unicode字符串。

I know that str().decode() is for when you have a string of bytes that you know has a certain character encoding, given that encoding name it will return a unicode string.

我知道 unicode()。encode()根据给定的编码名称将unicode chars转换为字节字符串。

I know that unicode().encode() converts unicode chars into a string of bytes according to a given encoding name.

但我不明白 str()。encode() unicode code>为。

But I don't understand what str().encode() and unicode().decode() are for. Can anyone explain, and possibly also correct anything else I've gotten wrong above?

编辑:

有几种方法可以解决上述问题答案给出了关于字符串上 .encode 的信息,但没有人似乎知道 .decode unicode。

Several answers give info on what .encode does on a string, but no-one seems to know what .decode does for unicode.

推荐答案

unicode字符串的 decode 方法有任何应用程序(除非你有一些非文本数据在unicode字符串由于某种原因 - 见下文)。它主要是因为历史原因,我想。在Python 3中它完全消失了。

The decode method of unicode strings really doesn't have any applications at all (unless you have some non-text data in a unicode string for some reason -- see below). It is mainly there for historical reasons, i think. In Python 3 it is completely gone.

unicode()。decode()使用默认(ascii)编解码器编码 s 的。验证方法如下:

unicode().decode() will perform an implicit encoding of s using the default (ascii) codec. Verify this like so:

>>> s = u'ö'
>>> s.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0:
ordinal not in range(128)

>>> s.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0:
ordinal not in range(128)

错误讯息完全相同。

对于 str ()这是另一种方法 - 它尝试使用默认编码的 s 的隐式解码 p>

For str().encode() it's the other way around -- it attempts an implicit decoding of s with the default encoding:

>>> s = 'ö'
>>> s.decode('utf-8')
u'\xf6'
>>> s.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)

这样使用, str()。encode()也是多余的。

Used like this, str().encode() is also superfluous.

有另一个应用程序的后一种方法是有用的:编码与字符集无关,因此可以以有意义的方式应用于8位字符串:

But there is another application of the latter method that is useful: there are encodings that have nothing to do with character sets, and thus can be applied to 8-bit strings in a meaningful way:

>>> s.encode('zip')
'x\x9c;\xbc\r\x00\x02>\x01z'

你是对的,但是:这两个应用程序的编码模糊使用是... awkard。再次,在Python 3中使用单独的 byte string 类型,这不再是一个问题。

You are right, though: the ambiguous usage of "encoding" for both these applications is... awkard. Again, with separate byte and string types in Python 3, this is no longer an issue.

这篇关于encode / decode之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆