快速获取.npy数组中每N行平均值的快速方法 [英] Fast way to take average of every N rows in a .npy array
问题描述
我有一个非常大的蒙版NumPy数组(originalArray
),其中包含许多行和两列.我想获取originalArray
中每两行的平均值并构建一个newArray
,其中每一行是originalArray
中两行的平均值(因此newArray
的行数是originalArray
的一半).这应该是一件简单的事情,但是下面的脚本非常慢.来自社区的任何建议将不胜感激.
I have a very large masked NumPy array (originalArray
) with many rows and two columns. I want take the average of every two rows in originalArray
and build a newArray
in which each row is the average of two rows in originalArray
(so newArray
has half as many rows as originalArray
). This should be a simple thing to do, but the script below is EXTREMELY slow. Any advice from the community would be greatly appreciated.
newList = []
for i in range(0, originalArray.shape[0], 2):
r = originalArray[i:i+2,:].mean(axis=0)
newList.append(r)
newArray = np.asarray(newList)
必须有一种更优雅的方式来做到这一点.非常感谢!
There must be a more elegant way of doing this. Many thanks!
推荐答案
两个值a
和b
的平均值为0.5*(a+b)
因此,您可以这样做:
The mean of two values a
and b
is 0.5*(a+b)
Therefore you can do it like this:
newArray = 0.5*(originalArray[0::2] + originalArray[1::2])
它将对所有连续的两行求和,最后将每个元素乘以0.5
.
It will sum up all two consecutive rows and in the end multiply every element by 0.5
.
由于在标题中您要求在N行上进行平均,因此这是一个更通用的解决方案:
def groupedAvg(myArray, N=2):
result = np.cumsum(myArray, 0)[N-1::N]/float(N)
result[1:] = result[1:] - result[:-1]
return result
n
个元素的平均值的一般形式是sum([x1,x2,...,xn])/n
.
向量v
中元素m
至m+n
的总和与从cumsum(v)
的第m+n
个元素减去第m-1
个元素相同. 除非m
为0,否则您将不减去任何值(结果[0]).
这就是我们在这里利用的优势.同样,由于所有内容都是线性的,因此除以N
并不重要,因此我们一开始就要做,但这只是口味问题.
The general form of the average over n
elements is sum([x1,x2,...,xn])/n
.
The sum of elements m
to m+n
in vector v
is the same as subtracting the m-1
th element from the m+n
th element of cumsum(v)
. Unless m
is 0, in that case you don't subtract anything (result[0]).
That is what we take advantage of here. Also since everything is linear, it is not important where we divide by N
, so we do it right at the beginning, but that is just a matter of taste.
如果最后一组的元素少于N
个,它将被完全忽略.
如果您不想忽略它,则必须特别对待最后一组:
If the last group has less than N
elements, it will be ignored completely.
If you don't want to ignore it, you have to treat the last group specially:
def avg(myArray, N=2):
cum = np.cumsum(myArray,0)
result = cum[N-1::N]/float(N)
result[1:] = result[1:] - result[:-1]
remainder = myArray.shape[0] % N
if remainder != 0:
if remainder < myArray.shape[0]:
lastAvg = (cum[-1]-cum[-1-remainder])/float(remainder)
else:
lastAvg = cum[-1]/float(remainder)
result = np.vstack([result, lastAvg])
return result
这篇关于快速获取.npy数组中每N行平均值的快速方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!