numpy ndarray哈希性 [英] numpy ndarray hashability

查看:211
本文介绍了numpy ndarray哈希性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在理解如何管理numpy对象的哈希性方面遇到一些问题.

I have some problems understanding how numpy objects hashability is managed.

>>> import numpy as np
>>> class Vector(np.ndarray):
...     pass
>>> nparray = np.array([0.])
>>> vector = Vector(shape=(1,), buffer=nparray)
>>> ndarray = np.ndarray(shape=(1,), buffer=nparray)
>>> nparray
array([ 0.])
>>> ndarray
array([ 0.])
>>> vector
Vector([ 0.])
>>> '__hash__' in dir(nparray)
True
>>> '__hash__' in dir(ndarray)
True
>>> '__hash__' in dir(vector)
True
>>> hash(nparray)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(ndarray)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'numpy.ndarray'
>>> hash(vector)
-9223372036586049780
>>> nparray.__hash__()
269709177
>>> ndarray.__hash__()
269702147
>>> vector.__hash__()
-9223372036586049780
>>> id(nparray)
4315346832
>>> id(ndarray)
4315234352
>>> id(vector)
4299616456
>>> nparray.__hash__() == id(nparray)
False
>>> ndarray.__hash__() == id(ndarray)
False
>>> vector.__hash__() == id(vector)
False
>>> hash(vector) == vector.__hash__()
True

怎么回事

  • numpy对象定义了__hash__方法,但是不可散列
  • 派生numpy.ndarray的类定义了__hash__,并且 是否可哈希?
  • numpy objects define a __hash__ method but are however not hashable
  • a class deriving numpy.ndarray defines __hash__ and is hashable?

我想念什么吗?

我正在使用Python 2.7.1和numpy 1.6.1

I'm using Python 2.7.1 and numpy 1.6.1

感谢您的帮助!

添加的对象id s

然后按照deinonychusaur的注释,尝试弄清楚哈希是否基于内容,我玩了numpy.nparray.dtype并发现了一些很奇怪的东西:

And following deinonychusaur comment and trying to figure out if hashing is based on content, I played with numpy.nparray.dtype and have something I find quite strange:

>>> [Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype) for mytype in ('float', 'int', 'float128')]
[Vector([ 1.]), Vector([1]), Vector([ 1.0], dtype=float128)]
>>> [id(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[4317742576, 4317742576, 4317742576]
>>> [hash(Vector(shape=(1,), buffer=np.array([1], dtype=mytype), dtype=mytype)) for mytype in ('float', 'int', 'float128')]
[269858911, 269858911, 269858911]

我很困惑... numpy中是否有一些(类型独立的)缓存机制?

I'm puzzled... is there some (type independant) caching mechanism in numpy?

推荐答案

我在Python 2.6.6和numpy 1.3.0中得到了相同的结果.根据 Python词汇表,如果定义了__hash__(而不是None),并且定义了__eq____cmp__. ndarray.__eq__ndarray.__hash__都已定义并且返回有意义的内容,所以我不明白为什么hash应该失败.经过一番快速搜索,我在python.scientific.devel邮件列表中找到了这篇文章,它指出数组从未打算成为可散列的-因此为什么定义ndarray.__hash__,我不知道.请注意,isinstance(nparray, collections.Hashable)返回True.

I get the same results in Python 2.6.6 and numpy 1.3.0. According to the Python glossary, an object should be hashable if __hash__ is defined (and is not None), and either __eq__ or __cmp__ is defined. ndarray.__eq__ and ndarray.__hash__ are both defined and return something meaningful, so I don't see why hash should fail. After a quick google, I found this post on the python.scientific.devel mailing list, which states that arrays have never been intended to be hashable - so why ndarray.__hash__ is defined, I have no idea. Note that isinstance(nparray, collections.Hashable) returns True.

请注意,nparray.__hash__()返回的内容与id(nparray)相同,因此这只是默认实现.也许很难或不可能删除早期版本的python中的__hash__的实现(显然在2.6中引入了__hash__ = None技术),因此他们使用某种C API魔术来实现该目标,不会传播到子类,并且不会阻止您显式调用ndarray.__hash__吗?

Note that nparray.__hash__() returns the same as id(nparray), so this is just the default implementation. Maybe it was difficult or impossible to remove the implementation of __hash__ in earlier versions of python (the __hash__ = None technique was apparently introduced in 2.6), so they used some kind of C API magic to achieve this in a way that wouldn't propagate to subclasses, and wouldn't stop you from calling ndarray.__hash__ explicitly?

Python 3.2.2和仓库中当前的numpy 2.0.0有所不同. __cmp__方法不再存在,因此散列性现在需要__hash____eq__(请参阅 Python 3词汇表).在此版本的numpy中,定义了ndarray.__hash__,但它只是None,因此无法调用. hash(nparray)失败,并且isinstance(nparray, collections.Hashable)返回预期的False. hash(vector)也失败.

Things are different in Python 3.2.2 and the current numpy 2.0.0 from the repo. The __cmp__ method no longer exists, so hashability now requires __hash__ and __eq__ (see Python 3 glossary). In this version of numpy, ndarray.__hash__ is defined, but it is just None, so cannot be called. hash(nparray) fails andisinstance(nparray, collections.Hashable) returns False as expected. hash(vector) also fails.

这篇关于numpy ndarray哈希性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆