Python 3.x 中字符串的内部表示是什么 [英] What is internal representation of string in Python 3.x

查看:30
本文介绍了Python 3.x 中字符串的内部表示是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Python 3.x 中,字符串由 Unicode 序数项组成.(请参阅下面语言参考中的引用.)Unicode 字符串的内部表示是什么?是 UTF-16 吗?

In Python 3.x, a string consists of items of Unicode ordinal. (See the quotation from the language reference below.) What is the internal representation of Unicode string? Is it UTF-16?

字符串对象的项目是Unicode 代码单元.一个 Unicode 代码单位由字符串对象表示一件物品,可以容纳一个16 位或 32 位值表示Unicode 序号(最大值为序数在sys.maxunicode,取决于如何Python 是在编译时配置的).代理对可能存在于Unicode 对象,会被上报作为两个单独的项目.

The items of a string object are Unicode code units. A Unicode code unit is represented by a string object of one item and can hold either a 16-bit or 32-bit value representing a Unicode ordinal (the maximum value for the ordinal is given in sys.maxunicode, and depends on how Python is configured at compile time). Surrogate pairs may be present in the Unicode object, and will be reported as two separate items.

推荐答案

Python 2.X 和 3.X 之间的 Unicode 内部表示没有变化.

There has been NO CHANGE in Unicode internal representation between Python 2.X and 3.X.

它绝对不是 UTF-16.UTF-anything 是一种面向字节的 EXTERNAL 表示.

It's definitely NOT UTF-16. UTF-anything is a byte-oriented EXTERNAL representation.

每个代码单元(字符、代理等)都被分配了一个范围(0, 2 ** 21)的数字.这被称为它的序数".

Each code unit (character, surrogate, etc) has been assigned a number from range(0, 2 ** 21). This is called its "ordinal".

真的,您引用的文档说明了一切.大多数 Python 二进制文件使用 16 位序数,这将你限制在基本多语言平面(BMP"),除非你想用代理来捣乱(如果你找不到你的头发衬衫并且你的指甲床被取消了,这很方便)生锈).要使用完整的 Unicode 曲目,您更喜欢宽构建"(32 位宽).

Really, the documentation you quoted says it all. Most Python binaries use 16-bit ordinals which restricts you to the Basic Multilingual Plane ("BMP") unless you want to muck about with surrogates (handy if you can't find your hair shirt and your bed of nails is off being de-rusted). For working with the full Unicode repertoire, you'd prefer a "wide build" (32 bits wide).

简而言之,unicode 对象的内部表示是一个 16 位无符号整数数组,或一个 32 位无符号整数数组(仅使用 21 位).

Briefly, the internal representation in a unicode object is an array of 16-bit unsigned integers, or an array of 32-bit unsigned integers (using only 21 bits).

这篇关于Python 3.x 中字符串的内部表示是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆