在 Python 中,两个对象何时相同? [英] In Python, when are two objects the same?

查看:24
本文介绍了在 Python 中,两个对象何时相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎2 is 2 and 3 is 3 在python中总是正确的,一般来说,对整数的任何引用都与任何其他引用相同到相同的整数.None 也会发生同样的情况(即,None 是 None).我知道这不会发生在用户定义的类型或可变类型上.但它有时也会在不可变类型上失败:

<预><代码>>>>() 是 ()真的>>>(2,) 是 (2,)错误的

也就是说:空元组的两个独立构造产生对内存中同一个对象的引用,但是相同的单(不可变)元素元组的两个独立构造最终会创建两个相同的对象.我测试过,frozenset 的工作方式类似于元组.

是什么决定了一个对象是否会在内存中被复制,或者是否会有一个带有大量引用的单个实例?它是否取决于对象是否在某种意义上是原子的"?是否因实施而异?

解决方案

Python 有一些类型,它保证只有一个实例.这些实例的示例是 NoneNotImplementedEllipsis.这些是(根据定义)单例,所以像 None is None 这样的东西保证返回 True 因为没有办法创建 NoneType.

它还提供了一些双例 1 True, False 2 -- 所有对 的引用True 指向同一个对象.同样,这是因为无法创建 bool 的新实例.

以上都是python语言保证的.然而,正如您所注意到的,有些类型(都是不可变的)存储了一些实例以供重用.这是语言允许的,但不同的实现可能会选择使用或不使用此允许 - 取决于它们的优化策略.属于此类别的一些示例是小整数 (-5 -> 255)、空的 tuple 和空的 frozenset.

最后,Cpythonintern在解析过程中的某些不可变对象...

例如如果您使用 Cpython 运行以下脚本,您将看到它返回 True:

def foo():返回 (2,)如果 __name__ == '__main__':打印 foo() 是 foo()

这看起来真的很奇怪.Cpython 使用的技巧是,每当它构造函数 foo 时,它都会看到一个包含其他简单(不可变)文字的元组文字.python 不会一遍又一遍地创建这个元组(或其等价物),而是只创建一次.由于整个交易是不可变的,因此没有更改该对象的危险.在一遍又一遍地调用相同的紧密循环的情况下,这对于性能来说可能是一个巨大的胜利.小字符串也被实习.真正的胜利在于字典查找.Python 可以进行(极快的)指针比较,然后在检查哈希冲突时退回到较慢的字符串比较.由于 Python 的大部分内容都是建立在字典查找上的,因此这对整个语言来说是一个很大的优化.

<小时>

1我可能刚刚编造了这个词......但希望你能明白......
2在正常情况下,你不需要需要检查对象是否是对True的引用——通常你只关心如果对象是真实的"——例如if if some_instance: ... 将执行分支.但是,我把它放在这里只是为了完整性.

<小时>

请注意,is 可用于比较非单例的事物.一种常见的用途是创建一个哨兵值:

sentinel = object()项目 = 下一个(可迭代的,哨兵)如果物品是哨兵:# 迭代耗尽.

或者:

_sentinel = object()def 函数(a, b, none_is_ok_value_here=_sentinel):如果 none_is_ok_value_here 是哨兵:# 将该函数视为未提供 `none_is_ok_value_here`.

这个故事的寓意是永远说出你的意思.如果你想检查一个值是否另一个值,那么使用is 运算符.如果你想检查一个值是否等于另一个值(但可能不同),那么使用==.有关 is== 之间区别的更多详细信息(以及何时使用哪个),请参阅以下帖子之一:

<小时>

附录

我们已经讨论了这些 CPython 实现细节,我们声称它们是优化.尝试衡量我们从所有这些优化中得到的东西会很好(除了在使用 is 运算符时会增加一些混乱).

字符串实习"和字典查找.

这是一个小脚本,您可以运行该脚本来查看如果您使用相同的字符串而不是不同的字符串来查找值,字典查找的速度有多快.请注意,我在变量名中使用了术语interned"——这些值不一定是 interned(尽管它们可能是).我只是用它来表示interned"字符串字典中的字符串.

导入时间实习生 = 'foo'not_interned = (interned + ' ').strip()断言实习生不是 not_internedd = {实习:'酒吧'}print('短字符串的时间')数字 = 100000000打印(时间.时间('d [实习]',setup='from __main__ import interned, d',数字=数字))打印(时间.时间('d[not_interned]',setup='from __main__ import not_interned, d',数字=数字))#################################################interned_long = 实习 * 100not_interned_long = (interned_long + ' ').strip()d[interned_long] = 'baz'断言 interned_long 不是 not_interned_longprint('长字符串的时间')打印(时间.时间('d[interned_long]',setup='from __main__ import interned_long, d',数字=数字))打印(时间.时间('d[not_interned_long]',setup='from __main__ import not_interned_long, d',数字=数字))

此处的确切值应该无关紧要,但在我的计算机上,短字符串的显示速度大约是 7 分之一.long 字符串几乎快 2 倍(因为如果字符串有更多字符要比较,则字符串比较需要更长的时间).差异在 python3.x 上没有那么明显,但它们仍然存在.

元组实习"

这是一个您可以使用的小脚本:

导入时间def foo_tuple():返回 (2, 3, 4)定义 foo_list():返回 [2, 3, 4]断言 foo_tuple() 是 foo_tuple()数字 = 10000000t_interned_tuple = timeit.timeit('foo_tuple()', setup='from __main__ import foo_tuple', number=number)t_list = (timeit.timeit('foo_list()', setup='from __main__ import foo_list', number=number))打印(t_interned_tuple)打印(t_list)打印(t_interned_tuple/t_list)打印('*' * 80)def tuple_creation(x):返回 (x,)def list_creation(x):返回 [x]t_create_tuple = timeit.timeit('tuple_creation(2)', setup='from __main__ import tuple_creation', number=number)t_create_list = timeit.timeit('list_creation(2)', setup='from __main__ import list_creation', number=number)打印(t_create_tuple)打印(t_create_list)打印(t_create_tuple/t_create_list)

这个时间有点棘手(我很高兴在评论中提出任何更好的想法).其要点是,平均而言(在我的计算机上),元组的创建时间大约是列表的 60%.但是,foo_tuple() 平均花费的时间是 foo_list() 花费的时间的 40%.这表明我们确实从这些实习生那里获得了一些加速.节省的时间似乎随着元组变大而增加(创建更长的列表需要更长的时间——元组创建"需要恒定的时间,因为它已经被创建了).

另请注意,我称之为实习".它实际上不是(至少不是在字符串被实习的同一意义上).我们可以在这个简单的脚本中看到不同之处:

def foo_tuple():返回 (2,)def bar_tuple():返回 (2,)定义 foo_string():返回 '​​foo'def bar_string():返回 '​​foo'print(foo_tuple() 是 foo_tuple()) # 真print(foo_tuple() 是 bar_tuple()) # False打印(foo_string() 是bar_string()) # 真

我们看到字符串实际上是内嵌的"——使用相同文字符号的不同调用返回相同的对象.元组实习"似乎特定于一行.

It seems that 2 is 2 and 3 is 3 will always be true in python, and in general, any reference to an integer is the same as any other reference to the same integer. The same happens to None (i.e., None is None). I know that this does not happen to user-defined types, or mutable types. But it sometimes fails on immutable types too:

>>> () is ()
True
>>> (2,) is (2,)
False

That is: two independent constructions of the empty tuple yield references to the same object in memory, but two independent constructions of identical one-(immutable-)element tuples end up creating two identical objects. I tested, and frozensets work in a manner similar to tuples.

What determines if an object will be duplicated in memory or will have a single instance with lots of references? Does it depend on whether the object is "atomic" in some sense? Does it vary according to implementation?

解决方案

Python has some types that it guarantees will only have one instance. Examples of these instances are None, NotImplemented, and Ellipsis. These are (by definition) singletons and so things like None is None are guaranteed to return True because there is no way to create a new instance of NoneType.

It also supplies a few doubletons 1 True, False 2 -- All references to True point to the same object. Again, this is because there is no way to create a new instance of bool.

The above things are all guaranteed by the python language. However, as you have noticed, there are some types (all immutable) that store some instances for reuse. This is allowed by the language, but different implementations may choose to use this allowance or not -- depending on their optimization strategies. Some examples that fall into this category are small integers (-5 -> 255), the empty tuple and empty frozenset.

Finally, Cpython interns certain immutable objects during parsing...

e.g. if you run the following script with Cpython, you'll see that it returns True:

def foo():
    return (2,)

if __name__ == '__main__':
    print foo() is foo()

This seems really odd. The trick that Cpython is playing is that whenever it constructs the function foo, it sees a tuple-literal that contains other simple (immutable) literals. Rather than create this tuple (or it's equivalents) over and over, python just creates it once. There's no danger of that object being changed since the whole deal is immutable. This can be a big win for performance where the same tight loop is called over and over. Small strings are interned as well. The real win here is in dictionary lookups. Python can do a (blazingly fast) pointer compare and then fall back on slower string comparisons when checking hash collisions. Since so much of python is built on dictionary lookups, this can be a big optimization for the language as a whole.


1I might have just made up that word ... But hopefully you get the idea...
2Under normal circumstances, you don't need do check if the object is a reference to True -- Usually you just care if the object is "truthy" -- e.g. if if some_instance: ... will execute the branch. But, I put that in here just for completeness.


Note that is can be used to compare things that aren't singletons. One common use is to create a sentinel value:

sentinel = object()
item = next(iterable, sentinel)
if items is sentinel:
   # iterable exhausted.

Or:

_sentinel = object()
def function(a, b, none_is_ok_value_here=_sentinel):
    if none_is_ok_value_here is sentinel:
        # Treat the function as if `none_is_ok_value_here` was not provided.

The moral of this story is to always say what you mean. If you want to check if a value is another value, then use the is operator. If you want to check if a value is equal to another value (but possibly distinct), then use ==. For more details on the difference between is and == (and when to use which), consult one of the following posts:


Addendum

We've talked about these CPython implementation details and we've claimed that they're optimizations. It'd be nice to try to measure just what we get from all this optimizing (other than a little added confusion when working with the is operator).

String "interning" and dictionary lookups.

Here's a small script that you can run to see how much faster dictionary lookups are if you use the same string to look up the value instead of a different string. Note, I use the term "interned" in the variable names -- These values aren't necessarily interned (though they could be). I'm just using that to indicate that the "interned" string is the string in the dictionary.

import timeit

interned = 'foo'
not_interned = (interned + ' ').strip()

assert interned is not not_interned


d = {interned: 'bar'}

print('Timings for short strings')
number = 100000000
print(timeit.timeit(
    'd[interned]',
    setup='from __main__ import interned, d',
    number=number))
print(timeit.timeit(
    'd[not_interned]',
    setup='from __main__ import not_interned, d',
    number=number))


####################################################

interned_long = interned * 100
not_interned_long = (interned_long + ' ').strip()

d[interned_long] = 'baz'

assert interned_long is not not_interned_long
print('Timings for long strings')
print(timeit.timeit(
    'd[interned_long]',
    setup='from __main__ import interned_long, d',
    number=number))
print(timeit.timeit(
    'd[not_interned_long]',
    setup='from __main__ import not_interned_long, d',
    number=number))

The exact values here shouldn't matter too much, but on my computer, the short strings show about 1 part in 7 faster. The long strings are almost 2x faster (because the string comparison takes longer if the string has more characters to compare). The differences aren't quite as striking on python3.x, but they're still definitely there.

Tuple "interning"

Here's a small script you can play around with:

import timeit

def foo_tuple():
    return (2, 3, 4)

def foo_list():
    return [2, 3, 4]

assert foo_tuple() is foo_tuple()

number = 10000000
t_interned_tuple = timeit.timeit('foo_tuple()', setup='from __main__ import foo_tuple', number=number)
t_list = (timeit.timeit('foo_list()', setup='from __main__ import foo_list', number=number))

print(t_interned_tuple)
print(t_list)
print(t_interned_tuple / t_list)
print('*' * 80)


def tuple_creation(x):
    return (x,)

def list_creation(x):
    return [x]

t_create_tuple = timeit.timeit('tuple_creation(2)', setup='from __main__ import tuple_creation', number=number)
t_create_list = timeit.timeit('list_creation(2)', setup='from __main__ import list_creation', number=number)
print(t_create_tuple)
print(t_create_list)
print(t_create_tuple / t_create_list)

This one is a bit trickier to time (and I'm happy to take any better ideas how to time it in comments). The gist of this is that on average (and on my computer), a tuple takes about 60% as long to create as a list does. However, foo_tuple() takes on average about 40% the time that foo_list() takes. That shows that we really do gain a little bit of a speedup from these interns. The time savings seem to increase as the tuple gets larger (creating a longer list takes longer -- The tuple "creation" takes constant time since it was already created).

Also note that I've called this "interning". It actually isn't (at least not in the same sense the strings are interned). We can see the difference in this simple script:

def foo_tuple():
    return (2,)

def bar_tuple():
    return (2,)

def foo_string():
    return 'foo'

def bar_string():
    return 'foo'

print(foo_tuple() is foo_tuple())  # True
print(foo_tuple() is bar_tuple())  # False

print(foo_string() is bar_string())  # True

We see that the strings are really "interned" -- Different invocations using the same literal notation return the same object. The tuple "interning" seems to be specific to a single line.

这篇关于在 Python 中,两个对象何时相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆