哪些对象可以保证具有不同的身份? [英] What objects are guaranteed to have different identity?

查看:44
本文介绍了哪些对象可以保证具有不同的身份?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

原问题:

(我的问题适用于 Python 3.2+,但我怀疑自 Python 2.7 以来这已发生变化.)

假设我使用一个我们通常期望创建对象的表达式.例子:[1,2,3];42;'abc';range(10);;open('readme.txt');MyClass();lambda x : 2 * x;等

假设两个这样的表达式在不同的时间执行并计算为相同的值"(即,具有相同的类型,并且比较相等).在什么条件下,Python 提供了我所说的不同对象保证,即两个表达式实际上创建了两个不同的对象(即,x is y 评估为 False,假设两个对象绑定到xy,并且两者同时在作用域内)?

我理解对于任何可变类型的对象,不同对象保证"是成立的:

x = [1,2]y = [1,2]断言 x 不是 y # 保证通过

我也知道对于某些不可变类型(strint),保证不成立;对于某些其他不可变类型(boolNoneType),相反的保证成立:

x = 真y = 不是不是 xassert x is not y # 保证失败x = 2y = 3 - 1断言 x 不是 y # 依赖于实现;在 CPython 中可能会失败x = 1234567890y = x + 1 - 1断言 x 不是 y # 依赖于实现;可能通过 CPython

但是所有其他不可变类型呢?

特别是,在不同时间创建的两个元组可以具有相同的身份吗?

我对此感兴趣的原因是我将图中的节点表示为 int 的元组,并且域模型使得任何两个节点都是不同的(即使它们由具有相同值的元组).我需要创建节点集.如果 Python 保证在不同时间创建的元组是不同的对象,我可以简单地将 tuple 子类化以重新定义相等性以表示同一性:

class DistinctTuple(tuple):__hash__ = tuple.__hash__def __eq__(自我,其他):回归自我是他者x = (1,2)y = (1,2)s = 设置(x,y)断言 len(s) == 1 # pass;但不是我想要的x = DistinctTuple(x)y = DistinctTuple(y)s = 设置(x,y)断言 len(s) == 2 # pass;如预期的

但是如果不能保证在不同时间创建的元组是不同的,那么上面是一个可怕的技术,它隐藏了一个可能随机出现并且可能很难复制和发现的休眠bug.在这种情况下,子类化无济于事;我实际上需要添加到每个元组,作为一个额外的元素,一个唯一的 id.或者,我可以将元组转换为列表.无论哪种方式,我都会使用更多内存.显然,除非我的原始子类化解决方案不安全,否则我不想使用这些替代方案.

我的猜测是 Python 不为不可变类型提供独特的对象保证",无论是内置的还是用户定义的.但是我在文档中没有找到明确的说明.

更新 1:

@LuperRouch @larsmans 感谢您到目前为止的讨论和答案.这是我仍然不清楚的最后一个问题:

<块引用>

有没有可能创建一个用户定义的对象类型会导致重用现有对象吗?

如果可能的话,我想知道如何验证我使用的任何类是否可能表现出这种行为.

这是我的理解.任何时候创建用户定义类的对象时,都会首先调用该类的 __new__() 方法.如果这个方法被覆盖,语言中的任何内容都不会阻止程序员返回对现有对象的引用,从而违反我的不同对象保证".显然,我可以通过检查类定义来观察它.

我不确定如果用户定义的类不覆盖 __new__()(或显式依赖于基类的 __new__())会发生什么.如果我写

class MyInt(int):经过

对象创建由 int.__new__() 处理.我希望这意味着我有时可能会看到以下断言失败:

x = MyInt(1)y = MyInt(1)assert x is not y # 可能会失败,因为 int.__new__() 可能会返回同一个对象两次?

但是在我对 CPython 的实验中,我无法实现这种行为.这是否意味着该语言为不覆盖 __new__ 的用户定义的类提供了不同的对象保证",还是只是一种任意的实现行为?

更新 2:

虽然我的 DistinctTuple 被证明是一个非常安全的实现,但我现在明白我使用 DistinctTuple 来建模节点的设计想法非常糟糕.

身份运算符已在该语言中可用;使 == 的行为与 is 的行为方式相同在逻辑上是多余的.

更糟糕的是,如果 == 可以做一些有用的事情,我让它不可用.例如,很可能在我的程序中的某个地方,我想查看两个节点是否由同一对整数表示;== 本来是完美的 - 事实上,这就是它默认的作用......

更糟糕的是,大多数人实际上确实希望 == 比较一些值"而不是身份——即使是用户定义的类.他们会被我只看身份的覆盖措手不及.

最后……我必须重新定义 == 的唯一原因是允许具有相同元组表示的多个节点成为一个集合的一部分.这是错误的处理方式!需要改变的不是 == 行为,而是容器类型!我只需要使用多重集而不是集合.

简而言之,虽然我的问题可能对其他情况有一些价值,但我绝对相信创建 class DistinctTuple 对我的用例来说是一个糟糕的主意(我强烈怀疑它没有有效用途案例).

解决方案

创建用户定义类型的对象是否有可能导致重用现有对象?

当且仅当用户定义的类型被明确设计为这样做时才会发生这种情况.使用 __new__() 或一些元类.

<块引用>

我想知道如何验证我使用的任何类是否可能表现出这种行为.

使用来源,卢克.

当涉及到 int 时,小整数是预先分配的,并且这些预先分配的整数在您创建或使用整数计算的任何地方使用.当您执行 MyInt(1) is MyInt(1) 时,您无法使其正常工作,因为您拥有的不是整数.但是:

<预><代码>>>>MyInt(1) + MyInt(1) 是 2真的

这是因为当然 MyInt(1) + MyInt(1) 不返回 MyInt.它返回一个 int,因为这是整数的 __add__ 返回的(这也是对预先分配的整数进行检查的地方).这如果有的话只是表明子类化 int 通常并不是特别有用.:-)

<块引用>

这是否意味着该语言为不覆盖 new 的用户定义的类提供了独特的对象保证",还是只是一种任意的实现行为?

它不保证,因为没有必要这样做.默认行为是创建一个新对象.如果您不希望发生这种情况,则必须覆盖它.有保证是没有意义的.

ORIGINAL QUESTION:

(My question applies to Python 3.2+, but I doubt this has changed since Python 2.7.)

Suppose I use an expression that we usually expect to create an object. Examples: [1,2,3]; 42; 'abc'; range(10); True; open('readme.txt'); MyClass(); lambda x : 2 * x; etc.

Suppose two such expressions are executed at different times and "evaluate to the same value" (i.e., have the same type, and compare as equal). Under what conditions does Python provide what I call a distinct object guarantee that the two expressions actually create two distinct objects (i.e., x is y evaluates as False, assuming the two objects are bound to x and y, and both are in scope at the same time)?

I understand that for objects of any mutable type, the "distinct object guarantee" holds:

x = [1,2]
y = [1,2]
assert x is not y # guaranteed to pass 

I also know for certain immutable types (str, int) the guarantee does not hold; and for certain other immutable types (bool, NoneType), the opposite guarantee holds:

x = True
y = not not x
assert x is not y # guaranteed to fail
x = 2
y = 3 - 1
assert x is not y # implementation-dependent; likely to fail in CPython
x = 1234567890
y = x + 1 - 1
assert x is not y # implementation-dependent; likely to pass in CPython

But what about all the other immutable types?

In particular, can two tuples created at different times have the same identity?

The reason I'm interested in this is that I represent nodes in my graph as tuples of int, and the domain model is such that any two nodes are distinct (even if they are represented by tuples with the same values). I need to create sets of nodes. If Python guarantees that tuples created at different times are distinct objects, I could simply subclass tuple to redefine equality to mean identity:

class DistinctTuple(tuple):
  __hash__ = tuple.__hash__
  def __eq__(self, other):
    return self is other

x = (1,2)
y = (1,2)
s = set(x,y)
assert len(s) == 1 # pass; but not what I want
x = DistinctTuple(x)
y = DistinctTuple(y)
s = set(x,y)
assert len(s) == 2 # pass; as desired

But if tuples created at different times are not guaranteed to be distinct, then the above is a terrible technique, which hides a dormant bug that may appear at random and may be very hard to replicate and find. In that case, subclassing won't help; I will actually need to add to each tuple, as an extra element, a unique id. Alternatively, I can convert my tuples to lists. Either way, I'd use more memory. Obviously, I'd prefer not to use these alternatives unless my original subclassing solution is unsafe.

My guess is that Python does not offer the "distinct object guarantee" for immutable types, either built-in or user-defined. But I haven't found a clear statement about it in the documentation.

UPDATE 1:

@LuperRouch @larsmans Thank you for the discussion and the answer so far. Here's the last issue I'm still unclear with:

Is there any chance that the creation of an object of a user-defined type results in a reuse of an existing object?

If this is possible, I'd like to know how I can verify for any class I work with whether it might exhibit such a behavior.

Here's my understanding. Any time an object of a user-defined class is created, the class' __new__() method is called first. If this method is overridden, nothing in the language would prevent the programmer from returning a reference to an existing object, thus violating my "distinct object guarantee". Obviously, I can observe it by examining the class definition.

I am not sure what happens if a user-defined class does not override __new__() (or explicitly relies __new__() from the base class). If I write

class MyInt(int):
  pass

the object creation is handled by int.__new__(). I would expect that this means I may sometimes see the following assertion fail:

x = MyInt(1)
y = MyInt(1)
assert x is not y # may fail, since int.__new__() might return the same object twice?

But in my experimentation with CPython I could not achieve such behavior. Does this mean the language provides "distinct object guarantee" for user-defined classes that don't override __new__, or is it just an arbitrary implementation behavior?

UPDATE 2:

While my DistinctTuple turned out to be a perfectly safe implementation, I now understand that my design idea of using DistinctTuple to model nodes is very bad.

The identity operator is already available in the language; making == behave in the same way as is is logically superfluous.

Worse, if == could have been done something useful, I made it unavailable. For instance, it's quite likely that somewhere in my program I'll want to see if two nodes are represented by the same pair of integers; == would have been perfect for that - and in fact, that's what it does by default...

Worse yet, most people actually do expect == to compare some "value" rather than identity - even for a user-defined class. They would be caught unawares with my override that only looks at identity.

Finally... the only reason I had to redefine == was to allow multiple nodes with the same tuple representation to be part of a set. This is the wrong way to go about it! It's not == behavior that needs to change, it's the container type! I simply needed to use multisets instead of sets.

In short, while my question may have some value for other situations, I am absolutely convinced that creating class DistinctTuple is a terrible idea for my use case (and I strongly suspect it has no valid use case at all).

解决方案

Is there any chance that the creation of an object of a user-defined type results in a reuse of an existing object?

This will happen if, and only if, the user-defined type is explicitly designed to do that. With __new__() or some metaclass.

I'd like to know how I can verify for any class I work with whether it might exhibit such a behavior.

Use the source, Luke.

When it comes to int, small integers are pre-allocated, and these pre-allocated integers are used wherever you create of calculate with integers. You can't get this working when you do MyInt(1) is MyInt(1), because what you have there are not integers. However:

>>> MyInt(1) + MyInt(1) is 2
True

This is because of course MyInt(1) + MyInt(1) does not return a MyInt. It returns an int, because that's what the __add__ of an integer returns (and that's where the check for pre-allocated integers occur as well). This if anything just shows that subclassing int in general isn't particularly useful. :-)

Does this mean the language provides "distinct object guarantee" for user-defined classes that don't override new, or is it just an arbitrary implementation behavior?

It doesn't guarantee it, because there is no need to do so. The default behavior is to create a new object. You have to override it if you don't want that to happen. Having a guarantee makes no sense.

这篇关于哪些对象可以保证具有不同的身份?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆