如何计算numpy中的连续数字 [英] How to count continuous numbers in numpy
问题描述
我有一个 1 和 0 的 Numpy 一维数组.例如
I have a Numpy one-dimensional array of 1 and 0. for e.g
a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0])
我想计算数组中连续的 0 和 1 并输出类似这样的内容
I want to count the continuous 0s and 1s in the array and output something like this
[1,3,7,1,1,2,3,2,2]
我所做的atm是
np.diff(np.where(np.abs(np.diff(a)) == 1)[0])
它输出
array([3, 7, 1, 1, 2, 3, 2])
如您所见,它缺少第一个计数 1.
as you can see it is missing the first count 1.
我试过 np.split
然后得到每个段的大小,但它似乎并不乐观.
I've tried np.split
and then get the sizes of each segments but it does not seem to be optimistic.
有没有更优雅的pythonic"解决方案?
Is there more elegant "pythonic" solution?
推荐答案
这是一种矢量化方法 -
Here's one vectorized approach -
np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
样品运行 -
In [208]: a = np.array([0,1,1,1,0,0,0,0,0,0,0,1,0,1,1,0,0,0,1,1,0,0])
In [209]: np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
Out[209]: array([1, 3, 7, 1, 1, 2, 3, 2, 2])
使用 boolean
连接更快 -
Faster one with boolean
concatenation -
np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] ))))
运行时测试
对于设置,让我们创建一个更大的数据集,其中包含 0s
和 1s
岛,为了与给定样本进行公平的基准测试,让岛长度在1
和 7
-
For the setup, let's create a bigger dataset with islands of 0s
and 1s
and for a fair benchmarking as with the given sample, let's have the island lengths vary between 1
and 7
-
In [257]: n = 100000 # thus would create 100000 pair of islands
In [258]: a = np.repeat(np.arange(n)%2, np.random.randint(1,7,(n)))
# Approach #1 proposed in this post
In [259]: %timeit np.diff(np.r_[0,np.flatnonzero(np.diff(a))+1,a.size])
100 loops, best of 3: 2.13 ms per loop
# Approach #2 proposed in this post
In [260]: %timeit np.diff(np.flatnonzero(np.concatenate(([True], a[1:]!= a[:-1], [True] ))))
1000 loops, best of 3: 1.21 ms per loop
# @Vineet Jain's soln
In [261]: %timeit [ sum(1 for i in g) for k,g in groupby(a)]
10 loops, best of 3: 61.3 ms per loop
这篇关于如何计算numpy中的连续数字的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!