numpy ndarray哈希性 [英] numpy ndarray hashability
问题描述
我在理解如何管理numpy对象的哈希性方面遇到一些问题.
I have some problems understanding how numpy objects hashability is managed.
>>> import numpy as np
>>> class Vector(np.ndarray):
... pass
>>> nparray = np.array([0.])
>>> vector = Vector(shape=(1,), buffer=nparray)
>>> ndarray = np.ndarray(shape=(1,), buffer=nparray)
>>> nparray
array([ 0.])
>>> ndarray
array([ 0.])
>>> vector
Vector([ 0.])
>>> '__hash__' in dir(nparray)
True
>>> '__hash__' in dir(ndarray)
True
>>> '__hash__' in dir(vector)
True
>>> hash(nparray)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(ndarray)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(vector)
-9223372036586049780
>>> nparray.__hash__()
269709177
>>> ndarray.__hash__()
269702147
>>> vector.__hash__()
-9223372036586049780
>>> id(nparray)
4315346832
>>> id(ndarray)
4315234352
>>> id(vector)
4299616456
>>> nparray.__hash__() == id(nparray)
False
>>> ndarray.__hash__() == id(ndarray)
False
>>> vector.__hash__() == id(vector)
False
>>> hash(vector) == vector.__hash__()
True
怎么回事
- numpy对象定义了
__hash__
方法,但是不可散列 - 派生
numpy.ndarray
的类定义了__hash__
,并且 是否可哈希?
- numpy objects define a
__hash__
method but are however not hashable - a class deriving
numpy.ndarray
defines__hash__
and is hashable?
我想念什么吗?
我正在使用Python 2.7.1和numpy 1.6.1
I'm using Python 2.7.1 and numpy 1.6.1
感谢您的帮助!
添加的对象id
s
然后按照deinonychusaur的注释,尝试弄清楚哈希是否基于内容,我玩了numpy.nparray.dtype
并发现了一些很奇怪的东西:
And following deinonychusaur comment and trying to figure out if hashing is based on content, I played with numpy.nparray.dtype
and have something I find quite strange:
>>> [Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype) for mytype in ('float', 'int', 'float128')]
[Vector([ 1.]), Vector([1]), Vector([ 1.0], dtype=float128)]
>>> [id(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[4317742576, 4317742576, 4317742576]
>>> [hash(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[269858911, 269858911, 269858911]
我很困惑... numpy中是否有一些(类型独立的)缓存机制?
I'm puzzled... is there some (type independant) caching mechanism in numpy?
推荐答案
我在Python 2.6.6和numpy 1.3.0中得到了相同的结果.根据 Python词汇表,如果定义了__hash__
(而不是None
),并且定义了__eq__
或__cmp__
. ndarray.__eq__
和ndarray.__hash__
都已定义并且返回有意义的内容,所以我不明白为什么hash
应该失败.经过一番快速搜索,我在python.scientific.devel邮件列表中找到了这篇文章,它指出数组从未打算成为可散列的-因此为什么定义ndarray.__hash__
,我不知道.请注意,isinstance(nparray, collections.Hashable)
返回True
.
I get the same results in Python 2.6.6 and numpy 1.3.0. According to the Python glossary, an object should be hashable if __hash__
is defined (and is not None
), and either __eq__
or __cmp__
is defined. ndarray.__eq__
and ndarray.__hash__
are both defined and return something meaningful, so I don't see why hash
should fail. After a quick google, I found this post on the python.scientific.devel mailing list, which states that arrays have never been intended to be hashable - so why ndarray.__hash__
is defined, I have no idea. Note that isinstance(nparray, collections.Hashable)
returns True
.
请注意,nparray.__hash__()
返回的内容与id(nparray)
相同,因此这只是默认实现.也许很难或不可能删除早期版本的python中的__hash__
的实现(显然在2.6中引入了__hash__ = None
技术),因此他们使用某种C API魔术来实现该目标,不会传播到子类,并且不会阻止您显式调用ndarray.__hash__
吗?
Note that nparray.__hash__()
returns the same as id(nparray)
, so this is just the default implementation. Maybe it was difficult or impossible to remove the implementation of __hash__
in earlier versions of python (the __hash__ = None
technique was apparently introduced in 2.6), so they used some kind of C API magic to achieve this in a way that wouldn't propagate to subclasses, and wouldn't stop you from calling ndarray.__hash__
explicitly?
Python 3.2.2和仓库中当前的numpy 2.0.0有所不同. __cmp__
方法不再存在,因此散列性现在需要__hash__
和__eq__
(请参阅 Python 3词汇表).在此版本的numpy中,定义了ndarray.__hash__
,但它只是None
,因此无法调用. hash(nparray)
失败,并且isinstance(nparray, collections.Hashable)
返回预期的False
. hash(vector)
也失败.
Things are different in Python 3.2.2 and the current numpy 2.0.0 from the repo. The __cmp__
method no longer exists, so hashability now requires __hash__
and __eq__
(see Python 3 glossary). In this version of numpy, ndarray.__hash__
is defined, but it is just None
, so cannot be called. hash(nparray)
fails andisinstance(nparray, collections.Hashable)
returns False
as expected. hash(vector)
also fails.
这篇关于numpy ndarray哈希性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!