numpy:累积多重计数 [英] numpy: cumulative multiplicity count
问题描述
我有一个可能有重复的整数排序数组.我想计算连续的相等值,当一个值与前一个值不同时从零开始.这是用一个简单的python循环实现的预期结果:
I have a sorted array of ints which might have repetitions. I would like to count consecutive equal values, restarting from zero when a value is different from the previous one. This is the expected result implemented with a simple python loop:
import numpy as np
def count_multiplicities(a):
r = np.zeros(a.shape, dtype=a.dtype)
for i in range(1, len(a)):
if a[i] == a[i-1]:
r[i] = r[i-1]+1
else:
r[i] = 0
return r
a = (np.random.rand(20)*5).astype(dtype=int)
a.sort()
print "given sorted array: ", a
print "multiplicity count: ", count_multiplicities(a)
输出:
given sorted array: [0 0 0 0 0 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4]
multiplicity count: [0 1 2 3 4 0 1 2 0 1 2 3 0 1 2 3 0 1 2 3]
如何使用 numpy 以有效的方式获得相同的结果?数组很长,但重复次数很少(比如不超过十次).
How can I get the same result in an efficient way using numpy? The array is very long, but the repetitions are just a few (say no more than ten).
在我的特殊情况下,我也知道值从零开始,并且连续值之间的差异是 0 或 1(值之间没有间隙).
In my special case I also know that values start from zero and that the difference between consecutive values is either 0 or 1 (no gaps in values).
推荐答案
这里有一个 基于cumsum
的矢量化方法-
Here's one cumsum
based vectorized approach -
def count_multiplicities_cumsum_vectorized(a):
out = np.ones(a.size,dtype=int)
idx = np.flatnonzero(a[1:] != a[:-1])+1
out[idx[0]] = -idx[0] + 1
out[0] = 0
out[idx[1:]] = idx[:-1] - idx[1:] + 1
np.cumsum(out, out=out)
return out
样品运行 -
In [58]: a
Out[58]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4])
In [59]: count_multiplicities(a) # Original approach
Out[59]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2])
In [60]: count_multiplicities_cumsum_vectorized(a)
Out[60]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2])
运行时测试 -
In [66]: a = (np.random.rand(200000)*1000).astype(dtype=int)
...: a.sort()
...:
In [67]: a
Out[67]: array([ 0, 0, 0, ..., 999, 999, 999])
In [68]: %timeit count_multiplicities(a)
10 loops, best of 3: 87.2 ms per loop
In [69]: %timeit count_multiplicities_cumsum_vectorized(a)
1000 loops, best of 3: 739 µs per loop
相关帖子
.
这篇关于numpy:累积多重计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!