字节字符串与 Unicode 字符串.Python [英] byte string vs. unicode string. Python

查看:74
本文介绍了字节字符串与 Unicode 字符串.Python的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

能否详细解释一下Python中字节串和Unicode串的区别.我已阅读这个:

<块引用>

字节码只是将源代码转换成字节数组

这是否意味着 Python 有自己的编码/编码格式?还是使用操作系统设置?我不明白.你能解释一下吗?谢谢!

解决方案

没有 python 不使用自己的编码.它将使用它有权访问的任何编码并且您指定.str 中的一个字符代表一个 unicode 字符.然而,为了表示超过 256 个字符,单个 unicode 编码使用每个字符超过一个字节来表示许多字符.bytearray 对象使您可以访问底层字节.str 对象具有 encode 方法,该方法采用表示编码的字符串并返回表示该编码中的字符串的 bytearray 对象.bytearray 对象具有 decode 方法,该方法接受一个表示编码的字符串并返回由解释 bytearraystr> 作为以给定编码编码的字符串.这是一个例子.

<预><代码>>>>a = "αά".encode('utf-8')>>>一种b'\xce\xb1\xce\xac'>>>a.decode('utf-8')'αά'

我们可以看到 UTF-8 使用四个字节,\xce、\xb1、\xce 和 \xac 来表示两个字符.在 Ignacio Vazquez-Abrams 提到的 Spolsky 文章之后,我会阅读 Python Unicode Howto.

Could you explain in detail what the difference is between byte string and Unicode string in Python. I have read this:

Byte code is simply the converted source code into arrays of bytes

Does it mean that Python has its own coding/encoding format? Or does it use the operation system settings? I don't understand. Could you please explain? Thank you!

解决方案

No python does not use its own encoding. It will use any encoding that it has access to and that you specify. A character in a str represents one unicode character. However to represent more than 256 characters, individual unicode encodings use more than one byte per character to represent many characters. bytearray objects give you access to the underlaying bytes. str objects have the encode method that takes a string representing an encoding and returns the bytearray object that represents the string in that encoding. bytearray objects have the decode method that takes a string representing an encoding and returns the str that results from interpreting the bytearray as a string encoded in the the given encoding. Here's an example.

>>> a = "αά".encode('utf-8')
>>> a
b'\xce\xb1\xce\xac'
>>> a.decode('utf-8')
'αά'

We can see that UTF-8 is using four bytes, \xce, \xb1, \xce, and \xac to represent two characters. After the Spolsky article that Ignacio Vazquez-Abrams referred to, I would read the Python Unicode Howto.

这篇关于字节字符串与 Unicode 字符串.Python的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆