在NumPy中查找特定长度的连续重复 [英] Find Consecutive Repeats of Specific Length in NumPy

查看:321
本文介绍了在NumPy中查找特定长度的连续重复的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

说我有一个NumPy数组:

Say that I have a NumPy array:

a = np.array([0, 1, 2, 2, 3, 4, 5, 5, 6, 7, 8, 9, 9, 9, 10, 11, 12, 13, 13, 13, 14, 15])

我有一个长度m = 2,由用户指定,以查看时间序列内是否有该长度的重复.在这种情况下,长度为m = 2的重复序列为:

And I have a length m = 2 that the user specifies in order to see if there are any repeats of that length within the time series. In this case, the repeats of length m = 2 are:

[2, 2]
[5, 5]
[9, 9]
[9, 9]
[13, 13]

用户可以将其更改为m = 3,长度为m = 3的重复项是:

And the user can change this to m = 3 and the repeats of length m = 3 are:

[9, 9, 9]
[13, 13, 13]

我需要一个函数,该函数返回找到重复的位置的索引或None.因此,对于m = 3,该函数将返回以下起始索引的NumPy数组:

I need a function that either returns the index of where a repeat is found or None. So, for m = 3 the function would return the following NumPy array of starting indices:

[11, 17]

对于m = 4,该函数将返回None.最干净,最快的方法是什么?

And for m = 4 the function would return None. What's the cleanest and fastest way to accomplish this?

更新 请注意,不必对数组进行排序,并且我们对排序后的结果 感兴趣.我们只想要来自未排序数组的结果.此数组的m = 2结果应相同:

Update Note that the array does not have to be sorted and we are not interested in the result after a sort. We only want the result from the unsorted array. Your result for m = 2 should be the same for this array:

b = np.array([0, 11, 2, 2, 3, 40, 5, 5, 16, 7, 80, 9, 9, 9, 1, 11, 12, 13, 13, 13, 4, 5])

推荐答案

方法1

我们可以利用 1D convolution 向量化的解决方案-

We could leverage 1D convolution for a vectorized solution -

def consec_repeat_starts(a, n):
    N = n-1
    m = a[:-1]==a[1:]
    return np.flatnonzero(np.convolve(m,np.ones(N, dtype=int))==N)-N+1

样品运行-

In [286]: a
Out[286]: 
array([ 0,  1,  2,  2,  3,  4,  5,  5,  6,  7,  8,  9,  9,  9, 10, 11, 12,
       13, 13, 13, 14, 15])

In [287]: consec_repeat_starts(a, 2)
Out[287]: array([ 2,  6, 11, 12, 17, 18])

In [288]: consec_repeat_starts(a, 3)
Out[288]: array([11, 17])

In [289]: consec_repeat_starts(a, 4)
Out[289]: array([], dtype=int64)

方法2

我们还可以使用binary-erosion-

from scipy.ndimage.morphology import binary_erosion

def consec_repeat_starts_v2(a, n):
    N = n-1
    m = a[:-1]==a[1:]
    return np.flatnonzero(binary_erosion(m,[1]*N))-(N//2)

这篇关于在NumPy中查找特定长度的连续重复的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆