encode / decode之间有什么区别? [英] What is the difference between encode/decode?
问题描述
我从来没有确定我理解了str / unicode decode和encode之间的区别。
I've never been sure that I understand the difference between str/unicode decode and encode.
我知道 str()。 decode()
用于当你有一个字符串的字符串,你知道有一定的字符编码,给定编码名称,它将返回一个unicode字符串。
I know that str().decode()
is for when you have a string of bytes that you know has a certain character encoding, given that encoding name it will return a unicode string.
我知道 unicode()。encode()
根据给定的编码名称将unicode chars转换为字节字符串。
I know that unicode().encode()
converts unicode chars into a string of bytes according to a given encoding name.
但我不明白 str()。encode()
和 unicode code>为。
But I don't understand what str().encode()
and unicode().decode()
are for. Can anyone explain, and possibly also correct anything else I've gotten wrong above?
编辑:
有几种方法可以解决上述问题答案给出了关于字符串上 .encode
的信息,但没有人似乎知道 .decode
unicode。
Several answers give info on what .encode
does on a string, but no-one seems to know what .decode
does for unicode.
推荐答案
unicode字符串的 decode
方法有任何应用程序(除非你有一些非文本数据在unicode字符串由于某种原因 - 见下文)。它主要是因为历史原因,我想。在Python 3中它完全消失了。
The decode
method of unicode strings really doesn't have any applications at all (unless you have some non-text data in a unicode string for some reason -- see below). It is mainly there for historical reasons, i think. In Python 3 it is completely gone.
unicode()。decode()
使用默认(ascii)编解码器编码 s
的。验证方法如下:
unicode().decode()
will perform an implicit encoding of s
using the default (ascii) codec. Verify this like so:
>>> s = u'ö'
>>> s.decode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0:
ordinal not in range(128)
>>> s.encode('ascii')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 0:
ordinal not in range(128)
错误讯息完全相同。
对于 str ()
这是另一种方法 - 它尝试使用默认编码的 s
的隐式解码 p>
For str().encode()
it's the other way around -- it attempts an implicit decoding of s
with the default encoding:
>>> s = 'ö'
>>> s.decode('utf-8')
u'\xf6'
>>> s.encode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)
这样使用, str()。encode()
也是多余的。
Used like this, str().encode()
is also superfluous.
但有另一个应用程序的后一种方法是有用的:编码与字符集无关,因此可以以有意义的方式应用于8位字符串:
But there is another application of the latter method that is useful: there are encodings that have nothing to do with character sets, and thus can be applied to 8-bit strings in a meaningful way:
>>> s.encode('zip')
'x\x9c;\xbc\r\x00\x02>\x01z'
你是对的,但是:这两个应用程序的编码模糊使用是... awkard。再次,在Python 3中使用单独的 byte
和 string
类型,这不再是一个问题。
You are right, though: the ambiguous usage of "encoding" for both these applications is... awkard. Again, with separate byte
and string
types in Python 3, this is no longer an issue.
这篇关于encode / decode之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!