如何计算numpy中的连续数字 [英] How to count continuous numbers in numpy
问题描述
例如,我有一个1和0的Numpy一维数组.
I have a Numpy one-dimensional array of 1 and 0. for e.g
a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0])
我想计算数组中的连续0和1,并输出类似这样的内容
I want to count the continuous 0s and 1s in the array and output something like this
[1,3,7,1,1,2,3,2,2]
我在atm做的是
np.diff(np.where(np.abs(np.diff(a)) == 1)[0])
它会输出
array([3, 7, 1, 1, 2, 3, 2])
您可以看到它缺少第一个计数1.
as you can see it is missing the first count 1.
我已经尝试过np.split
,然后获取每个段的大小,但这似乎并不乐观.
I've tried np.split
and then get the sizes of each segments but it does not seem to be optimistic.
是否有更优雅的"pythonic"解决方案?
Is there more elegant "pythonic" solution?
推荐答案
这里是一种矢量化方法-
Here's one vectorized approach -
np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
样品运行-
In [208]: a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0])
In [209]: np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
Out[209]: array([1, 3, 7, 1, 1, 2, 3, 2, 2])
具有boolean
串联的第一个-
np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] ))))
运行时测试
对于设置,我们创建一个具有0s
和1s
孤岛的更大数据集,并与给定样本进行公平的基准测试,让孤岛长度在1
和7
之间变化-
For the setup, let's create a bigger dataset with islands of 0s
and 1s
and for a fair benchmarking as with the given sample, let's have the island lengths vary between 1
and 7
-
In [257]: n = 100000 # thus would create 100000 pair of islands
In [258]: a = np.repeat(np.arange(n)%2, np.random.randint(1,7,(n)))
# Approach #1 proposed in this post
In [259]: %timeit np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
100 loops, best of 3: 2.13 ms per loop
# Approach #2 proposed in this post
In [260]: %timeit np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] ))))
1000 loops, best of 3: 1.21 ms per loop
# @Vineet Jain's soln
In [261]: %timeit [ sum(1 for i in g) for k,g in groupby(a)]
10 loops, best of 3: 61.3 ms per loop
这篇关于如何计算numpy中的连续数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!