Python 的 id() 有多独特? [英] How unique is Python's id()?

查看:39
本文介绍了Python 的 id() 有多独特?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

tl;博士Python 是否重用 ID?生命周期不重叠的两个对象获得相同 ID 的可能性有多大?

背景:我一直在研究一个完全用 Python 3 编写的复杂项目.我在测试中看到了一些问题,并花了很多时间寻找根本原因.经过一些分析,我怀疑当测试作为一个整体运行时(它是由专门的调度程序编排并运行的),它正在重用一些模拟方法,而不是用它们的原始方法实例化新对象.为了检查解释器是否正在重用,我使用了 id().

问题:id() 通常工作并显示对象标识符,并让我知道我的调用何时创建新实例而不是重用.但是如果两个对象相同,当 ids 时会发生什么?文档 说:

<块引用>

返回对象的身份".这是一个整数,保证在此对象的生命周期内是唯一且恒定的.生命周期不重叠的两个对象可能具有相同的 id() 值.

问题:

  1. 解释器什么时候可以重用 id() 值?只是在随机选择相同的内存区域时吗?如果只是随机的,那似乎极不可能,但仍然不能保证.

  2. 还有其他方法可以检查我实际引用的对象吗?我遇到了一种情况,我有对象,它有一个模拟方法.该对象不再使用,垃圾收集器将其销毁.之后我创建了一个相同类的新对象,它得到了一个新的 id() 但该方法得到了与它被模拟时相同的 id,它实际上 一个模拟.

  3. 有没有办法强制 Python 销毁给定的对象实例?从我所做的阅读来看,似乎没有,当它看到没有对对象的引用时,它取决于垃圾收集器,但我认为无论如何都值得询问.

解决方案

是的,CPython 重用了 id() 值.不要指望这些在 Python 程序中是独一无二的.

这是明确记录:

<块引用>

返回对象的身份".这是一个整数,保证在此对象的生命周期内是唯一且恒定的.生命周期不重叠的两个对象可能具有相同的 id() 值.

大胆强调我的.id 是唯一的,只要对象是活动.没有引用的对象从内存中删除,允许 id() 值被重新用于另一个对象,因此 非重叠生命周期 措辞.

请注意,这仅适用于 Python.org 提供的标准实现 CPython.还有其他 Python 实现,例如 IronPython、Jython 和 PyPy,它们对如何实现 id() 做出自己的选择,因为它们每个都可以对如何处理内存和对象生命周期做出不同的选择.

解决您的具体问题:

  1. 在 CPython 中,id() 是内存地址.新对象将被插入下一个可用内存空间,因此如果特定内存地址有足够的空间来容纳下一个新对象,则内存地址将被重用.创建相同大小的新对象时,您可以在解释器中看到这一点:

    <预><代码>>>>身份证(1234)4546982768>>>身份证(4321)4546982768

    1234 文字创建一个新的整数对象,id() 为其生成一个数值.由于没有对 int 值的进一步引用,它再次从内存中删除.但是使用不同的整数文字再次执行相同的表达式,您可能会看到相同的 id() 值(破坏循环引用的垃圾收集运行可以释放更多内存,因此您可以 不会再次看到相同的id().

    所以它不是随机的,但在 CPython 中它是内存分配算法的函数.

  2. 如果您需要检查特定对象,请保留对它的引用.这可能是一个weakref 弱引用 如果您只需要确保对象仍然活着".

    比如先记录一个对象引用,然后再检查:

    导入弱引用# 记录object_ref = weakref.ref(some_object)# 检查它是否仍然是同一个对象some_other_reference is object_ref() # 只有当它们是同一个对象时才为真

    弱引用不会使对象保持活动状态,但如果它活动,则 object_ref() 将返回它(它将返回 None 否则).

    您可以使用这种机制来生成真正唯一的标识符,见下文.

  3. 要销毁"一个对象,您只需删除对它的所有引用.变量(局部和全局)是引用.其他对象的属性以及容器中的条目(如列表、元组、字典、集合等)也是如此.

    当对一个对象的所有引用都消失的那一刻,该对象的引用计数下降到 0,然后它就会被删除.

    垃圾收集只需要打破循环引用,对象只相互引用,不再引用循环.由于这样的循环在没有帮助的情况下永远不会达到 0 的引用计数,垃圾收集器会定期检查此类循环并中断其中一个引用以帮助从内存中清除这些对象.

    因此,您可以通过删除对任何对象的所有引用来从内存中删除(释放)任何对象.如何实现取决于对象的引用方式.您可以要求解释器告诉您哪些对象正在使用 gc.get_referrers() 函数,但要考虑到不会给你变量名.它为您提供对象,例如作为全局引用对象的模块的 __dict__ 属性的字典对象等.对于完全在您控制之下的代码,最多使用 gc.get_referrers() 作为一种工具,可在您编写代码以删除对象时提醒自己引用对象的位置.

如果您必须在 Python 应用程序的生命周期内拥有唯一标识符,您就必须实现自己的工具.如果您的对象hashable 并支持弱引用,那么您可以使用 WeakKeyDictionary 实例 将任意对象与 UUID:

from weakref import WeakKeyDictionary从集合导入 defaultdict从 uuid 导入 uuid4类 UniqueIdMap(WeakKeyDictionary):def __init__(self, dict=None):super().__init__(self)# 用 defaultdict 替换数据以生成 uuidself.data = defaultdict(uuid4)如果 dict 不是 None:self.update(dict)uniqueidmap = UniqueIdMap()def uniqueid(obj):"""为对象产生一个唯一的整数 id.对象必须我 *hashable*.Id 是一个 UUID 并且应该是唯一的跨 Python 调用."""返回 uniqueidmap[obj].int

这仍然会产生整数,但由于它们是 UUID,因此不能保证是唯一的,但是您永远过程中遇到相同 ID 的可能性>你的生命比被陨石撞击还小.请参阅UUID 有多独特?

即使对于生命周期不重叠的对象,这也会为您提供唯一的 ID:

<预><代码>>>>Foo类:... 经过...>>>id(Foo())4547149104>>>id(Foo()) # 重用的内存地址4547149104>>>uniqueid(Foo())151797163173960170410969562162860139237>>>uniqueid(Foo()) # 但你仍然得到一个唯一的 UUID188632072566395632221804340107821543671

tl;dr Does Python reuse ids? How likely it is that two objects with non overlapping lifetime will get the same id?

Background: I've been working on a complex project, written purely in Python 3. I've been seeing some issues in testing and spent a lot of time searching for a root cause. After some analysis, my suspicion was that when the testing is being run as a whole (it's orchestrated and being run by a dedicated dispatcher) it's reusing some mocked methods instead of instatiating new objects with their original methods. To check if the interpreter is reusing I used id().

Problem: id() usually works and shows the object identifier and lets me tell when my call is creating a new instance and not reusing. But what happens when ids if two objects are the same? The documentation says:

Return the "identity" of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

The questions:

  1. When can the interpreter reuse id() values? Is it just when it randomly selects the same memory area? If it's just random, it seems extremely unlikely but it's still not guaranteed.

  2. Is there any other method to check what object I am actually referencing? I encountered a situation where I had the object, it had a mocked method. The object was no longer used, garbage collector destroyed it. After that I create a new object of the same class, it got a new id() but the method got the same id as when it was mocked and it actually was just a mock.

  3. Is there a way to force Python to destroy the given object instance? From the reading I did it appears that no and that it is up to a garbage collector when it sees no references to the object but I thought it's worth asking anyway.

解决方案

Yes, CPython re-uses id() values. Do not count on these being unique in a Python program.

This is clearly documented:

Return the "identity" of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Bold emphasis mine. The id is unique only as long as an object is alive. Objects that have no references left to them are removed from memory, allowing the id() value to be re-used for another object, hence the non-overlapping lifetimes wording.

Note that this applies to CPython only, the standard implementation provided by python.org. There are other Python implementations, such as IronPython, Jython and PyPy, that make their own choices about how to implement id(), because they each can make distinct choices on how to handle memory and object lifetimes.

To address your specific questions:

  1. In CPython, id() is the memory address. New objects will be slotted into the next available memory space, so if a specific memory address has enough space to hold the next new object, the memory address will be reused. You can see this in the interpreter when creating new objects that are the same size:

    >>> id(1234)
    4546982768
    >>> id(4321)
    4546982768
    

    The 1234 literal creates a new integer object, for which id() produces a numeric value. As there are no further references to the int value, it is removed from memory again. But executing the same expression again with a different integer literal, and chances are you'll see the same id() value (a garbage collection run breaking cyclic references could free up more memory, so you could also not see the same id() again.

    So it's not random, but in CPython it is a function of the memory allocation algorithms.

  2. If you need to check specific objects, keep your own reference to it. That can be a weakref weak reference if all you need to assure is that the object is still 'alive'.

    For example, recording an object reference first, then later checking it:

    import weakref
    
    # record
    object_ref = weakref.ref(some_object)
    
    # check if it's the same object still
    some_other_reference is object_ref()   # only true if they are the same object
    

    The weak reference won't keep the object alive, but if it is alive then the object_ref() will return it (it'll return None otherwise).

    You could use such a mechanism to generate really unique identifiers, see below.

  3. All you have to do to 'destroy' an object is to remove all references to it. Variables (local and global) are references. So are attributes on other objects, and entries in containers such as lists, tuples, dictionaries, sets, etc.

    The moment all references to an object are gone, the reference count on the object drops to 0 and it is deleted, there and then.

    Garbage collection only is needed to break cyclic references, objects that reference one another only, with no further references to the cycle. Because such a cycle will never reach a reference count of 0 without help, the garbage collector periodically checks for such cycles and breaks one of the references to help clear those objects from memory.

    So you can cause any object to be deleted from memory (freed), by removing all references to it. How you achieve that depends on how the object is referenced. You can ask the interpreter to tell you what objects are referencing a given object with the gc.get_referrers() function, but take into account that doesn't give you variable names. It gives you objects, such as the dictionary object that is the __dict__ attribute of a module that references the object as a global, etc. For code fully under your control, at most use gc.get_referrers() as a tool to remind yourself what places the object is referenced from as you write the code to remove those.

If you must have unique identifiers for the lifetime of the Python application, you'd have to implement your own facility. If your objects are hashable and support weak references, then you could just use a WeakKeyDictionary instance to associate arbitrary objects with UUIDs:

from weakref import WeakKeyDictionary
from collections import defaultdict
from uuid import uuid4

class UniqueIdMap(WeakKeyDictionary):
    def __init__(self, dict=None):
        super().__init__(self)
        # replace data with a defaultdict to generate uuids
        self.data = defaultdict(uuid4)
        if dict is not None:
            self.update(dict)

uniqueidmap = UniqueIdMap()

def uniqueid(obj):
    """Produce a unique integer id for the object.

    Object must me *hashable*. Id is a UUID and should be unique
    across Python invocations.

    """
    return uniqueidmap[obj].int

This still produces integers, but as they are UUIDs they are not quite guaranteed to be unique, but the likelihood you'll ever encounter the same ID during your lifetime are smaller than being hit by a meteorite. See How unique is UUID?

This then gives you unique ids even for objects with non-overlapping lifetimes:

>>> class Foo:
...     pass
...
>>> id(Foo())
4547149104
>>> id(Foo())  # memory address reused
4547149104
>>> uniqueid(Foo())
151797163173960170410969562162860139237
>>> uniqueid(Foo())  # but you still get a unique UUID
188632072566395632221804340107821543671

这篇关于Python 的 id() 有多独特?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆