改善NumPy阵列上的运算性能 [英] Improving performance of operations on a NumPy array

查看:75
本文介绍了改善NumPy阵列上的运算性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用numpy.delete从while循环内的数组中删除元素. 仅当数组不为空时,此while循环才有效.这段代码可以正常工作,但速度会变慢 当数组具有超过1e6个元素时,该值是相当可观的.这是一个示例:

I'm using numpy.delete to remove elements from an array that is inside a while loop. This while loop is valid only if the array is not empty. This code works fine but slows down considerably when the array has over 1e6 elements. Here is an example:

while(array.shape[0] > 0):
     ix = where((array >= x) & (array <= y))[0]
     array = delete(array,ix,None)

我试图使这段代码高效,但是我找不到加快while循环的好方法.我认为这里的瓶颈是删除,该删除必须涉及某种副本.我已经尝试过使用masked array以避免复制,但是我不太擅长python,masked array并不是那么容易搜索.是否有一种好方法来快速使用delete或replace,以便上面的循环可以处理7e6个元素而无需花费24小时?

I've tried to make this code efficient but I cannot find a good way to speed up the while loop. The bottleneck here is, I think, the delete which must involve a copy of some kind. I've tried using masked array in order to avoid copying but I'm not that good at python and masked array are not that easy to search. Is there a good and fast way to use delete or replace it so that 7e6 elements can be handled by the loop above without taking 24 hours?

谢谢

推荐答案

因此,您可以通过以下方式大大提高代码的性能:

So you can substantially improve the performance of your code by:

  • 消除循环;和

  • eliminating the loop; and

避免进行删除操作(这会导致原始操作的副本 数组)

avoiding the delete operations (which cause a copy of the original array)

NumPy 1.7引入了一个新的蒙版,它比原始蒙版更容易使用;它的性能也要好得多,因为它是NumPy核心数组对象的一部分.我认为这可能对您有用,因为使用它可以 避免昂贵的删除操作 .

NumPy 1.7 introduced a new mask that is far easier to use than the original; it's performance is also much better because it's part of the NumPy core array object. I think this might be useful to you because by using it you can avoid the expensive delete operation.

换句话说,不要删除不需要的数组元素,而只是对其进行遮罩.其他答案已经建议了这一点,但我建议使用新蒙版

In other words, instead of deleting the array elements you don't want, just mask them. This has been suggested in other Answers, but i am suggesting to use the new mask

要使用NA,只需导入NA

to use NA, just import NA

>>> from numpy import NA as NA

然后对于给定的数组,将maskna标志设置为 True

then for a given array, set the maskna flag to True

>>> A.flags.maskna = True

或者,大多数数组构造函数(从1.7开始)具有参数maskna,您可以将其设置为 True

Alternatively, most array constructors (as of 1.7) have the parameter maskna, which you can set to True

>>> A[3,3] = NA

array([[7, 5, 4, 8, 4],
       [2, 4, 3, 7, 3],
       [3, 1, 3, 2, 1],
       [8, 2, 0, NA, 7],
       [0, 7, 2, 5, 5],
       [5, 4, 2, 7, 4],
       [1, 2, 9, 2, 3],
       [7, 5, 1, 2, 9]])

>>> A.sum(axis=0)
array([33, 30, 24, NA, 36])

通常这不是您想要的-也就是说,您仍然希望将NA视为0的那列的总和:

Often this is not what you want--i.e., you still want the sum of that column with the NA treated as if it were 0:

要获得该行为,请将 True 传递给skipma参数(大多数NumPy数组构造函数在NumPy 1.7中都具有此参数):

To get that behavior, pass in True for the skipma parameter (most NumPy array constructors have this parameter in NumPy 1.7):

>>> A.sum(axis=0, skipna=True)
array([33, 30, 24, 33, 36])

总而言之,要加快代码的速度,请消除循环并使用新的掩码:

In sum, to speed up your code, eliminate the loop and use the new mask:

>>> A[(A<=3)&(A<=6)] = NA

>>> A
array([[8, 8, 4, NA, NA],
       [7, 9, NA, NA, 8],
       [NA, 6, 9, 5, NA],
       [9, 4, 6, 6, 5],
       [NA, 6, 8, NA, NA],
       [8, 5, 7, 7, NA],
       [NA, 4, 5, 9, 9],
       [NA, 8, NA, 5, 9]])

在这种情况下,NA占位符就像0,我相信这是您想要的:

The NA placeholders--in this context--behave like 0s, which i believe is what you want:

>>> A.sum(axis=0, skipna=True)
array([32, 50, 39, 32, 31])

这篇关于改善NumPy阵列上的运算性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆