unicode 字符串中的内存位置 [英] memory location in unicode strings
问题描述
我知道有人在我创建时解释了原因Python 2.7 中的相等 unicode 字符串它们不指向内存中的相同位置就像在普通"字符串中一样
<预><代码>>>>a1 = 'a'>>>a2 = 'a'>>>a1 是 a2真的好吧,这是我的预期,但是
<预><代码>>>>ua1 = u'a'>>>ua2 = u'a'>>>ua1 是 ua2错误的为什么?怎么样?
普通字符串不能保证被实习.有时是,有时不是.这些规则很复杂,特定于版本,并且故意没有记录.
您可以依赖这样一个事实:只要有一个好主意,Python 就会尝试将小型的、常用的对象实习.而且,如果您编写任何依赖于 a1 is a2
或相反的代码,它就会在最不方便的时候中断.
如果你想要更多的东西,你必须查看你感兴趣的任何实现的任何版本的源代码.对于 CPython,细节主要在 stringobject.c
中 stringobject.c
a href="http://hg.python.org/cpython/file/2.6/Objects/stringobject.c" rel="nofollow noreferrer">2.6 和 2.7, unicodeobject.c
for 3.3.
后一个文件当然也存在于 2.x 中(它仍然定义了 unicode
类型,这与 3.x 中的 str
类型不同.X).您可以从 2.7 来源看到 是对实习 unicode
字符串的一些支持,即使您不能对它们调用 intern
.乍一看,2.7 似乎可以处理内部的 unicode
字符串,但永远不会创建它们.
与此同时,3.3 使事情变得更加有趣,因为 str
对象可以指向 UTF-8、UTF-16 或 UTF-32 存储,这些存储可能是内部的,但是使用旧式 Unicode API 可能仍会以新副本结束.因此,即使 a1 是 a2
,如果您尝试获取它们的字符,它们也可能具有不同的缓冲区.
python 何时选择实习字符串有对细节有更多的了解.但同样,来源才是最重要的.
I know someone explain why when I create equal unicode strings in Python 2.7 they do not point to the same location in memory As in "normal" strings
>>> a1 = 'a'
>>> a2 = 'a'
>>> a1 is a2
True
ok that was what I expected, but
>>> ua1 = u'a'
>>> ua2 = u'a'
>>> ua1 is ua2
False
why? how?
Normal strings are not guaranteed to be interned. Sometimes they are, sometimes they aren't. The rules are complicated, version-specific, and intentionally not documented.
You can depend on the fact that Python tries to intern small-ish, commonly-used objects whenever it's a good idea. And that, if you write any code that depends on either a1 is a2
or the converse, it will break whenever it's most inconvenient.
If you want any more than this, you have to look at the source for whichever version of whichever implementation you're interested in. For CPython, the details are mostly inside stringobject.c
for 2.6 and 2.7, unicodeobject.c
for 3.3.
The latter file of course also exists in 2.x (where it still defines the unicode
type, that's just not the same as the str
type as in 3.x). You can see from the 2.7 source that there is some support for interning unicode
strings, even if you can't call intern
on them. From a quick glance, it looks like 2.7 can handle interned unicode
strings, but won't ever create them.
Meanwhile, 3.3 makes things even more fun, as a str
object can point at UTF-8, UTF-16, or UTF-32 storage, which might be interned, but code that uses the old-style Unicode APIs may still end up with a new copy. So, even if a1 is a2
, if you try to get at their characters, they may have different buffers.
When does python choose to intern a string has some more insight into the details. But again, the source is all that matters.
这篇关于unicode 字符串中的内存位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!