unicode 字符串中的内存位置 [英] memory location in unicode strings

查看:56
本文介绍了unicode 字符串中的内存位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道有人在我创建时解释了原因Python 2.7 中的相等 unicode 字符串它们不指向内存中的相同位置就像在普通"字符串中一样

<预><代码>>>>a1 = 'a'>>>a2 = 'a'>>>a1 是 a2真的

好吧,这是我的预期,但是

<预><代码>>>>ua1 = u'a'>>>ua2 = u'a'>>>ua1 是 ua2错误的

为什么?怎么样?

解决方案

普通字符串不能保证被实习.有时是,有时不是.这些规则很复杂,特定于版本,并且故意没有记录.

您可以依赖这样一个事实:只要有一个好主意,Python 就会尝试将小型的、常用的对象实习.而且,如果您编写任何依赖于 a1 is a2 或相反的代码,它就会在最不方便的时候中断.

如果你想要更多的东西,你必须查看你感兴趣的任何实现的任何版本的源代码.对于 CPython,细节主要在 stringobject.cstringobject.ca href="http://hg.python.org/cpython/file/2.6/Objects/stringobject.c" rel="nofollow noreferrer">2.6 和 2.7, unicodeobject.c for 3.3.

后一个文件当然也存在于 2.x 中(它仍然定义了 unicode 类型,这与 3.x 中的 str 类型不同.X).您可以从 2.7 来源看到 是对实习 unicode 字符串的一些支持,即使您不能对它们调用 intern.乍一看,2.7 似乎可以处理内部的 unicode 字符串,但永远不会创建它们.

与此同时,3.3 使事情变得更加有趣,因为 str 对象可以指向 UTF-8、UTF-16 或 UTF-32 存储,这些存储可能是内部的,但是使用旧式 Unicode API 可能仍会以新副本结束.因此,即使 a1 是 a2,如果您尝试获取它们的字符,它们也可能具有不同的缓冲区.

python 何时选择实习字符串有对细节有更多的了解.但同样,来源才是最重要的.

I know someone explain why when I create equal unicode strings in Python 2.7 they do not point to the same location in memory As in "normal" strings

>>> a1 = 'a'
>>> a2 = 'a'
>>> a1 is a2
True

ok that was what I expected, but

>>> ua1 = u'a'
>>> ua2 = u'a'
>>> ua1 is ua2
False

why? how?

解决方案

Normal strings are not guaranteed to be interned. Sometimes they are, sometimes they aren't. The rules are complicated, version-specific, and intentionally not documented.

You can depend on the fact that Python tries to intern small-ish, commonly-used objects whenever it's a good idea. And that, if you write any code that depends on either a1 is a2 or the converse, it will break whenever it's most inconvenient.

If you want any more than this, you have to look at the source for whichever version of whichever implementation you're interested in. For CPython, the details are mostly inside stringobject.c for 2.6 and 2.7, unicodeobject.c for 3.3.

The latter file of course also exists in 2.x (where it still defines the unicode type, that's just not the same as the str type as in 3.x). You can see from the 2.7 source that there is some support for interning unicode strings, even if you can't call intern on them. From a quick glance, it looks like 2.7 can handle interned unicode strings, but won't ever create them.

Meanwhile, 3.3 makes things even more fun, as a str object can point at UTF-8, UTF-16, or UTF-32 storage, which might be interned, but code that uses the old-style Unicode APIs may still end up with a new copy. So, even if a1 is a2, if you try to get at their characters, they may have different buffers.

When does python choose to intern a string has some more insight into the details. But again, the source is all that matters.

这篇关于unicode 字符串中的内存位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆