如何在numpy类数组上使用掩码索引? [英] How to use mask indexing on numpy arrays of classes?

查看:191
本文介绍了如何在numpy类数组上使用掩码索引?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用自定义类的numpy array时:

When working with numpy array of custom classes like:

class TestClass:
    active = False

如何使用内联掩码(布尔索引数组),如此处所述:

How to use the inline masking (boolean index arrays) like described here: http://docs.scipy.org/doc/numpy/user/basics.indexing.html#boolean-or-mask-index-arrays

直接尝试失败:

items = np.array([TestClass() for _ in range(10)])
items[items.active]

  AttributeError: 'numpy.ndarray' object has no attribute 'active'

有什么建议吗?

推荐答案

所以您的数组是dtype=object(打印出来),每个元素都指向您的类的实例:

So your array is dtype=object (print it) and each element points to an instance of your class:

items = np.array([TestClass() for _ in range(10)])

现在尝试:

items.active

items是一个数组; active是类的属性,而不是对象数组的属性.您的定义没有为类ndarray添加任何功能.错误不在掩盖中;是在尝试获取实例属性.

items is an array; active is an attribute of your class, not an attribute of the array of your objects. Your definition does not add any functionality to the class ndarray. The error isn't in the masking; it's in trying to get the instance attribute.

对数组的许多操作都是迭代完成的.这种数组类似于普通的Python列表.

Many operations on arrays like this have be done iteratively. This kind of array is similar to a plain Python list.

[obj.active for obj in items]

或将其变回数组

np.array([obj...])

items[[True,False,True,...]]应该可以工作,但这是因为掩码已经是布尔列表或数组了.

items[[True,False,True,...]] should work, but that's because the mask is a boolean list or array already.

===================

====================

让我们修改您的课程,使其显示出一些有趣的东西.请注意,我是将active分配给实例,而不是像您一样分配给该类:

Lets modify your class so it shows something interesting. Note I am assigning active to instances, not, as you did, to the class:

In [1671]: class TestClass:
      ...:     def __init__(self,val):
      ...:        self.active = bool(val%2)

In [1672]: items = np.array([TestClass(i) for i in range(10)])

In [1674]: items
Out[1674]: 
array([<__main__.TestClass object at 0xb106758c>,
       <__main__.TestClass object at 0xb117764c>,
       ...
       <__main__.TestClass object at 0xb269850c>], dtype=object)
# print of the array isn't interesting.  The class needs a `__str__` method.

对属性的以下简单迭代访问:

This simple iterative access to the attribute:

In [1675]: [i.active for i in items]
Out[1675]: [False, True, False, True, False, True, False, True, False, True]

np.frompyfunc提供了一种更强大的方法来访问数组的每个元素. operator.attrgetter('active')(i)是执行i.active的功能方法.

np.frompyfunc provides a more powerful way of accessing each element of an array. operator.attrgetter('active')(i) is a functional way of doing i.active.

In [1676]: f=np.frompyfunc(operator.attrgetter('active'),1,1)
In [1677]: f(items)
Out[1677]: array([False, True, False, True, False, True, False, True, False, True], dtype=object)

但是当我改变数组的形状时,此功能的主要优点就显现了:

but the main advantage of this function appears when I change the shape of the array:

In [1678]: f(items.reshape(2,5))
Out[1678]: 
array([[False, True, False, True, False],
       [True, False, True, False, True]], dtype=object)

请注意,此数组是dtype对象.这就是frompyfunc的作用.要获取布尔数组,我们需要更改类型:

Note this array is dtype object. That's what frompyfunc does. To get an array of booleans we need to change type:

In [1679]: f(items.reshape(2,5)).astype(bool)
Out[1679]: 
array([[False,  True, False,  True, False],
       [ True, False,  True, False,  True]], dtype=bool)

np.vectorize使用frompyfunc,并使dtype更加用户友好.但是在时间上要慢一些.

np.vectorize uses frompyfunc, and makes the dtype a little more user friendly. But in timings it's a bit slower.

===============

===============

扩展乔恩的评论

In [1702]: class TestClass:
      ...:     def __init__(self,val):
      ...:        self.active = bool(val%2)
      ...:     def __bool__(self):
      ...:         return self.active
      ...:     def __str__(self):
      ...:         return 'TestClass(%s)'%self.active
      ...:     def __repr__(self):
      ...:         return str(self)

In [1707]: items = np.array([TestClass(i) for i in range(5)])

items现在以一种有信息的方式显示;并转换为字符串:

items now display in an informative manner; and convert to strings:

In [1708]: items
Out[1708]: 
array([TestClass(False), TestClass(True), TestClass(False),
       TestClass(True), TestClass(False)], dtype=object)
In [1709]: items.astype('S20')
Out[1709]: 
array([b'TestClass(False)', b'TestClass(True)', b'TestClass(False)',
       b'TestClass(True)', b'TestClass(False)'], 
      dtype='|S20')

并转换为bool:

In [1710]: items.astype(bool)
Out[1710]: array([False,  True, False,  True, False], dtype=bool)

实际上,astype正在将转换方法应用于数组的每个元素.我们还可以定义__int____add__,这表明向自定义类添加功能比向数组类本身添加功能更容易.我不希望得到与本机类型相同的速度.

In effect astype is applying the conversion method to each element of the array. We could also define __int__, __add__, This shows that it is easier to add functionality to the custom class than to the array class itself. I wouldn't expect to get the same speed as with native types.

这篇关于如何在numpy类数组上使用掩码索引?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆