Python字符串和str()方法编码和解码 [英] Python strings and str() method encoding and decoding

查看:154
本文介绍了Python字符串和str()方法编码和解码的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我看到 Python手册提到 .encode() .decode()字符串方法。在Python CLI上玩耍,我看到我可以使用与常规字符串不同的数据类型创建unicode字符串 u'hello' 'hello' 并且可以使用 str()转换/转换。但是,当使用ASCII 127 u'שלום'之前的字符时,真正的问题就开始了,而且我很难根据经验确定正在发生的事情。

I see that the Python manual mentions .encode() and .decode() string methods. Playing around on the Python CLI I see that I can create unicode strings u'hello' with a different datatype than a 'regular' string 'hello' and can convert / cast with str(). But the real problems start when using characters above ASCII 127 u'שלום' and I am having a hard time determining empirically exactly what is happening.

堆栈溢出 < a href =https://stackoverflow.com/questions/2513027/encoding-gives-ascii-codec-cant-encode-character-ordinal-not-in-range128>是 overflowing 示例 confusion 关于 <一个href =https://stackoverflow.com/questions/8436522/noob-queries-on-unicode-and-str-methods-in-python> Python的 unicode 字符串编码/解码 处理

Stack Overflow is overflowing with examples of confusion regarding Python's unicode and string-encoding/decoding handling.

究竟发生了什么(字节如何更改,数据类型如何改变)当使用 str()方法对字符串进行编码和解码时,特别是当在7中不能表示的字符tes被包含在字符串中?看来,一个具有数据类型< type'str'> 的Python变量可以被编码并解码?如果它是编码的,我明白这意味着该字符串由UTF-8,ISO-8859-1或其他一些编码表示,是否正确?如果它被解码,这是什么意思?是解码字符串unicode吗?如果是这样,那么为什么他们没有数据类型< type'unicode'>

What exactly happens (how are the bytes changed, and how is the datatype changed) when encoding and decoding strings with the str() method, especially when characters that cannot be represented in 7 bytes are included in the string? Is it true, as it seems, that a Python variable with datatype <type 'str'> can be both encoded and decoded? If it is encoded, I understand that means that the string is represented by UTF-8, ISO-8859-1, or some other encoding, is this correct? If it is decoded, what does this mean? Are decoded strings unicode? If so, then why don't they have the datatype <type 'unicode'>?

那些稍后会阅读的人的兴趣,我认为Python 2和Python 3应该被解决。谢谢!

In the interest of those who will read this later, I think that both Python 2 and Python 3 should be addressed. Thank you!

推荐答案

这只是Python 2中的情况。对于Python 2的字符串,code>方法是一个疣,它已经在Python 3中被改变(其中等价的字节只有解码)。

This is only the case in Python 2. The existence of a decode method on Python 2's strings is a wart, which has been changed in Python 3 (where the equivalent, bytes, has only decode).

您不能对已编码的字符串进行编码。当您在 str 上调用 encode 时会发生什么是Python隐式调用 decode 使用默认编码,通常是ASCII。这几乎总是不是你想要的。您应该始终调用 decode 将str转换为unicode,然后将其转换为不同的编码。

You can't 'encode' an already-encoded string. What happens when you do call encode on a str is that Python implicitly calls decode on it using the default encoding, which is usually ASCII. This is almost always not what you want. You should always call decode to convert a str to unicode before converting it to a different encoding.

(并解码 通过这个问题。)

(And decoded strings are unicode, and they do have type <unicode>, so I don't know what you mean by that question.)

在Python 3中,默认情况下,字符串是unicode。您只能将它们编码为个字节 - 如上所述,只能解码。

In Python 3 of course strings are unicode by default. You can only encode them to bytes - which, as I mention above, can only be decoded.

这篇关于Python字符串和str()方法编码和解码的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆