在什么情况下相等的字符串共享相同的引用? [英] Under which circumstances do equal strings share the same reference?

查看:71
本文介绍了在什么情况下相等的字符串共享相同的引用?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经搜索了网络,并发现了堆栈溢出问题,但是无法找到该问题的答案.我观察到的是在Python 2.7.3中,如果您为两个变量分配相同的单个字符串,例如

I have searched the web and stack overflow questions but been unable to find an answer to this question. The observation that I've made is that in Python 2.7.3, if you assign two variables the same single character string, e.g.

>>> a = 'a'
>>> b = 'a'
>>> c = ' '
>>> d = ' '

然后变量将共享相同的引用:

Then the variables will share the same reference:

>>> a is b
True
>>> c is d
True

对于某些较长的字符串也是如此:

This is also true for some longer strings:

>>> a = 'abc'
>>> b = 'abc'
>>> a is b
True
>>> '  ' is '  '
True
>>> ' ' * 1 is ' ' * 1
True

但是,在很多情况下(意外地)不共享引用:

However, there are a lot of cases where the reference is (unexpectantly) not shared:

>>> a = 'a c'
>>> b = 'a c'
>>> a is b
False
>>> c = '  '
>>> d = '  '
>>> c is d
False
>>> ' ' * 2 is ' ' * 2
False

有人可以解释一下原因吗?

Can someone please explain the reason for this?

我怀疑解释器可能会进行简化/替换和/或某些缓存机制,这些机制利用了python字符串在某些特殊情况下不可变来进行优化的事实,但是我知道什么?我尝试使用str构造函数和copy.deepcopy函数制作字符串的深层副本,但字符串仍然不一致地共享引用.

I suspect there might be simplifications/substitutions made by the interpreter and/or some caching mechanism that makes use of the fact that python strings are immutable to optimize in some special cases, but what do I know? I tried making deep copies of strings using the str constructor and the copy.deepcopy function but the strings still inconsistently share references.

我遇到问题的原因是因为我在为新型python类的克隆方法编写的某些单元测试中检查对字符串的引用是否相等.

The reason I'm having problems with this is because I check for inequality of references to strings in some unit tests I'm writing for clone methods of new-style python classes.

推荐答案

何时缓存和重用字符串的详细信息取决于实现,可以从Python版本更改为Python版本,因此无法依赖.如果要检查字符串是否相等,请使用==,而不是is.

The details of when strings are cached and reused are implementation-dependent, can change from Python version to Python version and cannot be relied upon. If you want to check strings for equality, use ==, not is.

在CPython(最常用的Python实现)中,总是对源代码中出现的字符串文字进行中间检查,因此,如果相同的字符串文字在源代码中出现两次,则它们最终将指向同一字符串对象.在Python 2.x中,您还可以调用内置函数 intern() 强制设置特定字符串,但是实际上您不应该这样做.

In CPython (the most commonly-used Python implementation), string literals that occur in the source code are always interned, so if the same string literal occurs twice in the source code, they will end up pointing to the same string object. In Python 2.x, you can also call the built-in function intern() to force that a particular string is interned, but you actually shouldn't do so.

编辑关于您检查实例之间属性是否不正确共享的实际目的:这种检查仅对可变对象有用.对于不可变类型的属性,共享和非共享对象之间没有语义差异.您可以使用

Edit regarding you actual aim of checking whether attributes are improperly shared between instances: This kind of check is only useful for mutable objects. For attributes of immutable type, there is no semantic difference between shared and unshared objects. You could exclude immutable types from your tests by using

Immutable = basestring, tuple, numbers.Number, frozenset
# ...
if not isinstance(x, Immutable):    # Exclude types known to be immutable

请注意,这还将排除包含可变对象的元组.如果要测试它们,则需要递归地降级为元组.

Note that this would also exclude tuples that contain mutable objects. If you wanted to test those, you would need to recursively descend into tuples.

这篇关于在什么情况下相等的字符串共享相同的引用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆