快速获取.npy数组中每N行平均值的快速方法 [英] Fast way to take average of every N rows in a .npy array

查看:114
本文介绍了快速获取.npy数组中每N行平均值的快速方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常大的蒙版NumPy数组(originalArray),其中包含许多行和两列.我想获取originalArray中每两行的平均值并构建一个newArray,其中每一行是originalArray中两行的平均值(因此newArray的行数是originalArray的一半).这应该是一件简单的事情,但是下面的脚本非常慢.来自社区的任何建议将不胜感激.

I have a very large masked NumPy array (originalArray) with many rows and two columns. I want take the average of every two rows in originalArray and build a newArray in which each row is the average of two rows in originalArray (so newArray has half as many rows as originalArray). This should be a simple thing to do, but the script below is EXTREMELY slow. Any advice from the community would be greatly appreciated.

newList = []
for i in range(0, originalArray.shape[0], 2):
    r = originalArray[i:i+2,:].mean(axis=0)
    newList.append(r)
newArray = np.asarray(newList)

必须有一种更优雅的方式来做到这一点.非常感谢!

There must be a more elegant way of doing this. Many thanks!

推荐答案

两个值ab的平均值为0.5*(a+b)
因此,您可以这样做:

The mean of two values a and b is 0.5*(a+b)
Therefore you can do it like this:

newArray = 0.5*(originalArray[0::2] + originalArray[1::2])

它将对所有连续的两行求和,最后将每个元素乘以0.5.

It will sum up all two consecutive rows and in the end multiply every element by 0.5.

由于在标题中您要求在N行上进行平均,因此这是一个更通用的解决方案:

def groupedAvg(myArray, N=2):
    result = np.cumsum(myArray, 0)[N-1::N]/float(N)
    result[1:] = result[1:] - result[:-1]
    return result

n个元素的平均值的一般形式是sum([x1,x2,...,xn])/n. 向量v中元素mm+n的总和与从cumsum(v)的第m+n个元素减去第m-1个元素相同. 除非m为0,否则您将不减去任何值(结果[0]).
这就是我们在这里利用的优势.同样,由于所有内容都是线性的,因此除以N并不重要,因此我们一开始就要做,但这只是口味问题.

The general form of the average over n elements is sum([x1,x2,...,xn])/n. The sum of elements m to m+n in vector v is the same as subtracting the m-1th element from the m+nth element of cumsum(v). Unless m is 0, in that case you don't subtract anything (result[0]).
That is what we take advantage of here. Also since everything is linear, it is not important where we divide by N, so we do it right at the beginning, but that is just a matter of taste.

如果最后一组的元素少于N个,它将被完全忽略. 如果您不想忽略它,则必须特别对待最后一组:

If the last group has less than N elements, it will be ignored completely. If you don't want to ignore it, you have to treat the last group specially:

def avg(myArray, N=2):
    cum = np.cumsum(myArray,0)
    result = cum[N-1::N]/float(N)
    result[1:] = result[1:] - result[:-1]

    remainder = myArray.shape[0] % N
    if remainder != 0:
        if remainder < myArray.shape[0]:
            lastAvg = (cum[-1]-cum[-1-remainder])/float(remainder)
        else:
            lastAvg = cum[-1]/float(remainder)
        result = np.vstack([result, lastAvg])

    return result

这篇关于快速获取.npy数组中每N行平均值的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆