如何在numpy类数组上使用掩码索引? [英] How to use mask indexing on numpy arrays of classes?
问题描述
使用自定义类的numpy array
时:
When working with numpy array
of custom classes like:
class TestClass:
active = False
How to use the inline masking (boolean index arrays) like described here: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays
直接尝试失败:
items = np.array([TestClass() for _ in range(10)])
items[items.active]
AttributeError: 'numpy.ndarray' object has no attribute 'active'
有什么建议吗?
推荐答案
所以您的数组是dtype=object
(打印出来),每个元素都指向您的类的实例:
So your array is dtype=object
(print it) and each element points to an instance of your class:
items = np.array([TestClass() for _ in range(10)])
现在尝试:
items.active
items
是一个数组; active
是类的属性,而不是对象数组的属性.您的定义没有为类ndarray
添加任何功能.错误不在掩盖中;是在尝试获取实例属性.
items
is an array; active
is an attribute of your class, not an attribute of the array of your objects. Your definition does not add any functionality to the class ndarray
. The error isn't in the masking; it's in trying to get the instance attribute.
对数组的许多操作都是迭代完成的.这种数组类似于普通的Python列表.
Many operations on arrays like this have be done iteratively. This kind of array is similar to a plain Python list.
[obj.active for obj in items]
或将其变回数组
np.array([obj...])
items[[True,False,True,...]]
应该可以工作,但这是因为掩码已经是布尔列表或数组了.
items[[True,False,True,...]]
should work, but that's because the mask is a boolean list or array already.
===================
====================
让我们修改您的课程,使其显示出一些有趣的东西.请注意,我是将active
分配给实例,而不是像您一样分配给该类:
Lets modify your class so it shows something interesting. Note I am assigning active
to instances, not, as you did, to the class:
In [1671]: class TestClass:
...: def __init__(self,val):
...: self.active = bool(val%2)
In [1672]: items = np.array([TestClass(i) for i in range(10)])
In [1674]: items
Out[1674]:
array([<__main__.TestClass object at 0xb106758c>,
<__main__.TestClass object at 0xb117764c>,
...
<__main__.TestClass object at 0xb269850c>], dtype=object)
# print of the array isn't interesting. The class needs a `__str__` method.
对属性的以下简单迭代访问:
This simple iterative access to the attribute:
In [1675]: [i.active for i in items]
Out[1675]: [False, True, False, True, False, True, False, True, False, True]
np.frompyfunc
提供了一种更强大的方法来访问数组的每个元素. operator.attrgetter('active')(i)
是执行i.active
的功能方法.
np.frompyfunc
provides a more powerful way of accessing each element of an array. operator.attrgetter('active')(i)
is a functional way of doing i.active
.
In [1676]: f=np.frompyfunc(operator.attrgetter('active'),1,1)
In [1677]: f(items)
Out[1677]: array([False, True, False, True, False, True, False, True, False, True], dtype=object)
但是当我改变数组的形状时,此功能的主要优点就显现了:
but the main advantage of this function appears when I change the shape of the array:
In [1678]: f(items.reshape(2,5))
Out[1678]:
array([[False, True, False, True, False],
[True, False, True, False, True]], dtype=object)
请注意,此数组是dtype对象.这就是frompyfunc
的作用.要获取布尔数组,我们需要更改类型:
Note this array is dtype object. That's what frompyfunc
does. To get an array of booleans we need to change type:
In [1679]: f(items.reshape(2,5)).astype(bool)
Out[1679]:
array([[False, True, False, True, False],
[ True, False, True, False, True]], dtype=bool)
np.vectorize
使用frompyfunc
,并使dtype更加用户友好.但是在时间上要慢一些.
np.vectorize
uses frompyfunc
, and makes the dtype a little more user friendly. But in timings it's a bit slower.
===============
===============
扩展乔恩的评论
In [1702]: class TestClass:
...: def __init__(self,val):
...: self.active = bool(val%2)
...: def __bool__(self):
...: return self.active
...: def __str__(self):
...: return 'TestClass(%s)'%self.active
...: def __repr__(self):
...: return str(self)
In [1707]: items = np.array([TestClass(i) for i in range(5)])
items
现在以一种有信息的方式显示;并转换为字符串:
items
now display in an informative manner; and convert to strings:
In [1708]: items
Out[1708]:
array([TestClass(False), TestClass(True), TestClass(False),
TestClass(True), TestClass(False)], dtype=object)
In [1709]: items.astype('S20')
Out[1709]:
array([b'TestClass(False)', b'TestClass(True)', b'TestClass(False)',
b'TestClass(True)', b'TestClass(False)'],
dtype='|S20')
并转换为bool
:
In [1710]: items.astype(bool)
Out[1710]: array([False, True, False, True, False], dtype=bool)
实际上,astype
正在将转换方法应用于数组的每个元素.我们还可以定义__int__
,__add__
,这表明向自定义类添加功能比向数组类本身添加功能更容易.我不希望得到与本机类型相同的速度.
In effect astype
is applying the conversion method to each element of the array. We could also define __int__
, __add__
, This shows that it is easier to add functionality to the custom class than to the array class itself. I wouldn't expect to get the same speed as with native types.
这篇关于如何在numpy类数组上使用掩码索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!