在NumPy中查找特定长度的连续重复 [英] Find Consecutive Repeats of Specific Length in NumPy
问题描述
说我有一个NumPy数组:
Say that I have a NumPy array:
a = np.array([0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 13, 13, 14, 15])
我有一个长度m = 2
,由用户指定,以查看时间序列内是否有该长度的重复.在这种情况下,长度为m = 2
的重复序列为:
And I have a length m = 2
that the user specifies in order to see if there are any repeats of that length within the time series. In this case, the repeats of length m = 2
are:
[2, 2]
[5, 5]
[9, 9]
[9, 9]
[13, 13]
用户可以将其更改为m = 3
,长度为m = 3
的重复项是:
And the user can change this to m = 3
and the repeats of length m = 3
are:
[9, 9, 9]
[13, 13, 13]
我需要一个函数,该函数返回找到重复的位置的索引或None
.因此,对于m = 3
,该函数将返回以下起始索引的NumPy数组:
I need a function that either returns the index of where a repeat is found or None
. So, for m = 3
the function would return the following NumPy array of starting indices:
[11, 17]
对于m = 4
,该函数将返回None
.最干净,最快的方法是什么?
And for m = 4
the function would return None
. What's the cleanest and fastest way to accomplish this?
更新
请注意,不必对数组进行排序,并且我们对排序后的结果 不 感兴趣.我们只想要来自未排序数组的结果.此数组的m = 2
结果应相同:
Update
Note that the array does not have to be sorted and we are not interested in the result after a sort. We only want the result from the unsorted array. Your result for m = 2
should be the same for this array:
b = np.array([0, 11, 2, 2, 3, 40, 5, 5, 16, 7, 80, 9, 9, 9, 1, 11, 12, 13, 13, 13, 4, 5])
推荐答案
方法1
我们可以利用 1D convolution
向量化的解决方案-
We could leverage 1D convolution
for a vectorized solution -
def consec_repeat_starts(a, n):
N = n-1
m = a[:-1]==a[1:]
return np.flatnonzero(np.convolve(m,np.ones(N, dtype=int))==N)-N+1
样品运行-
In [286]: a
Out[286]:
array([ 0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12,
13, 13, 13, 14, 15])
In [287]: consec_repeat_starts(a, 2)
Out[287]: array([ 2, 6, 11, 12, 17, 18])
In [288]: consec_repeat_starts(a, 3)
Out[288]: array([11, 17])
In [289]: consec_repeat_starts(a, 4)
Out[289]: array([], dtype=int64)
方法2
我们还可以使用binary-erosion
-
from scipy.ndimage.morphology import binary_erosion
def consec_repeat_starts_v2(a, n):
N = n-1
m = a[:-1]==a[1:]
return np.flatnonzero(binary_erosion(m,[1]*N))-(N//2)
这篇关于在NumPy中查找特定长度的连续重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!