编码/解码之间有什么区别? [英] What is the difference between encode/decode?

查看:13
本文介绍了编码/解码之间有什么区别?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直不确定自己是否理解 str/unicode 解码和编码之间的区别.

我知道 str().decode() 用于当你有一个你知道有特定字符编码的字节串时,给定该编码名称,它将返回一个 unicode 字符串.

我知道 unicode().encode() 根据给定的编码名称将 unicode 字符转换为字节串.

但我不明白 str().encode()unicode().decode() 是干什么用的.任何人都可以解释一下,并可能还纠正我在上面做错的其他任何事情吗?

几个答案提供了关于 .encode 对字符串的作用的信息,但似乎没有人知道 .decode 对 unicode 的作用.

解决方案

unicode 字符串的 decode 方法真的根本没有任何应用程序(除非你有一些非文本数据在一个unicode 字符串出于某种原因——见下文).我认为这主要是出于历史原因.在 Python 3 中它完全消失了.

unicode().decode() 将使用默认 (ascii) 编解码器对 s 执行隐式编码.像这样验证:

<预><代码>>>>s = u'ö'>>>s.decode()回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中UnicodeEncodeError: 'ascii' 编解码器无法对位置 0 的字符 u'xf6' 进行编码:序号不在范围内(128)>>>s.encode('ascii')回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中UnicodeEncodeError: 'ascii' 编解码器无法对位置 0 的字符 u'xf6' 进行编码:序号不在范围内(128)

错误信息完全一样.

对于 str().encode() 则相反——它尝试使用默认编码对 s 进行隐式解码:

<预><代码>>>>s = 'ö'>>>s.decode('utf-8')你'xf6'>>>s.encode()回溯(最近一次调用最后一次):文件<stdin>",第 1 行,在 <module> 中UnicodeDecodeError: 'ascii' 编解码器无法解码位置 0 的字节 0xc3:序号不在范围内(128)

这样使用,str().encode()也是多余的.

但是后一种方法的另一个应用程序很有用:有编码 与字符集无关,因此可以以有意义的方式应用于 8 位字符串:

<预><代码>>>>s.encode('zip')'xx9c;xbc x00x02>x01z'

您是对的,不过:这两个应用程序中编码"的含糊用法是……很尴尬.同样,在 Python 3 中使用单独的 bytestring 类型,这不再是一个问题.

I've never been sure that I understand the difference between str/unicode decode and encode.

I know that str().decode() is for when you have a string of bytes that you know has a certain character encoding, given that encoding name it will return a unicode string.

I know that unicode().encode() converts unicode chars into a string of bytes according to a given encoding name.

But I don't understand what str().encode() and unicode().decode() are for. Can anyone explain, and possibly also correct anything else I've gotten wrong above?

EDIT:

Several answers give info on what .encode does on a string, but no-one seems to know what .decode does for unicode.

解决方案

The decode method of unicode strings really doesn't have any applications at all (unless you have some non-text data in a unicode string for some reason -- see below). It is mainly there for historical reasons, i think. In Python 3 it is completely gone.

unicode().decode() will perform an implicit encoding of s using the default (ascii) codec. Verify this like so:

>>> s = u'ö'
>>> s.decode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'xf6' in position 0:
ordinal not in range(128)

>>> s.encode('ascii')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'xf6' in position 0:
ordinal not in range(128)

The error messages are exactly the same.

For str().encode() it's the other way around -- it attempts an implicit decoding of s with the default encoding:

>>> s = 'ö'
>>> s.decode('utf-8')
u'xf6'
>>> s.encode()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0:
ordinal not in range(128)

Used like this, str().encode() is also superfluous.

But there is another application of the latter method that is useful: there are encodings that have nothing to do with character sets, and thus can be applied to 8-bit strings in a meaningful way:

>>> s.encode('zip')
'xx9c;xbc
x00x02>x01z'

You are right, though: the ambiguous usage of "encoding" for both these applications is... awkard. Again, with separate byte and string types in Python 3, this is no longer an issue.

这篇关于编码/解码之间有什么区别?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆