在NumPy数组中搜索序列 [英] Searching a sequence in a NumPy array

查看:81
本文介绍了在NumPy数组中搜索序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下数组:

 array([2, 0, 0, 1, 0, 1, 0, 0])

如何获取出现值序列的索引:[0,0]?因此,在这种情况下的预期输出将是:[1,2,6,7].

How do I get the indices where I have occurrence of sequence of values : [0,0]? So, the expected output for such a case would be : [1,2,6,7].

1)请注意,[0,0]只是一个序列.可能是[0,0,0][4,6,8,9][5,2,0],什么都可以.

1) Please note that [0,0] is just a sequence. It could be [0,0,0] or [4,6,8,9] or [5,2,0], just anything.

2)如果将我的数组修改为:array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0]),则具有相同序列[0,0]的预期结果将是[1,2,3,4,8,9].

2) If my array were modified to : array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0]), the expected result with the same sequence of [0,0] would be [1,2,3,4,8,9].

我正在寻找一些NumPy快捷方式.

I am looking for some NumPy shortcut.

推荐答案

好吧,这基本上是 template-matching problem 经常出现在图像处理中.本文列出了两种方法:基于纯NumPy和基于OpenCV(cv2).

Well, this is basically a template-matching problem that comes up in image-processing a lot. Listed in this post are two approaches: Pure NumPy based and OpenCV (cv2) based.

方法1::使用NumPy,可以在输入数组的整个长度上创建一个2D个滑动索引数组.因此,每一行都是元素的滑动窗口.接下来,将每一行与输入序列匹配,这将引入

Approach #1: With NumPy, one can create a 2D array of sliding indices across the entire length of the input array. Thus, each row would be a sliding window of elements. Next, match up each row with the input sequence, which will bring in broadcasting for a vectorized solution. We look for all True rows indicating those are the ones that are the perfect matches and as such would be the starting indices of the matches. Finally, using those indices, create a range of indices extending up to the length of the sequence, to give us the desired output. The implementation would be -

def search_sequence_numpy(arr,seq):
    """ Find sequence in an array using NumPy only.

    Parameters
    ----------    
    arr    : input 1D array
    seq    : input 1D array

    Output
    ------    
    Output : 1D Array of indices in the input array that satisfy the 
    matching of input sequence in the input array.
    In case of no match, an empty list is returned.
    """

    # Store sizes of input array and sequence
    Na, Nseq = arr.size, seq.size

    # Range of sequence
    r_seq = np.arange(Nseq)

    # Create a 2D array of sliding indices across the entire length of input array.
    # Match up with the input sequence & get the matching starting indices.
    M = (arr[np.arange(Na-Nseq+1)[:,None] + r_seq] == seq).all(1)

    # Get the range of those indices as final output
    if M.any() >0:
        return np.where(np.convolve(M,np.ones((Nseq),dtype=int))>0)[0]
    else:
        return []         # No match found

方法2::使用OpenCV(cv2),我们为template-matching提供了内置功能:

Approach #2: With OpenCV (cv2), we have a built-in function for template-matching : cv2.matchTemplate. Using this, we would have the starting matching indices. Rest of the steps would be same as for the previous approach. Here's the implementation with cv2 :

from cv2 import matchTemplate as cv2m

def search_sequence_cv2(arr,seq):
    """ Find sequence in an array using cv2.
    """

    # Run a template match with input sequence as the template across
    # the entire length of the input array and get scores.
    S = cv2m(arr.astype('uint8'),seq.astype('uint8'),cv2.TM_SQDIFF)

    # Now, with floating point array cases, the matching scores might not be 
    # exactly zeros, but would be very small numbers as compared to others.
    # So, for that use a very small to be used to threshold the scorees 
    # against and decide for matches.
    thresh = 1e-5 # Would depend on elements in seq. So, be careful setting this.

    # Find the matching indices
    idx = np.where(S.ravel() < thresh)[0]

    # Get the range of those indices as final output
    if len(idx)>0:
        return np.unique((idx[:,None] + np.arange(seq.size)).ravel())
    else:
        return []         # No match found


样品运行

In [512]: arr = np.array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0])

In [513]: seq = np.array([0,0])

In [514]: search_sequence_numpy(arr,seq)
Out[514]: array([1, 2, 3, 4, 8, 9])

In [515]: search_sequence_cv2(arr,seq)
Out[515]: array([1, 2, 3, 4, 8, 9])

运行时测试

In [477]: arr = np.random.randint(0,9,(100000))
     ...: seq = np.array([3,6,8,4])
     ...: 

In [478]: np.allclose(search_sequence_numpy(arr,seq),search_sequence_cv2(arr,seq))
Out[478]: True

In [479]: %timeit search_sequence_numpy(arr,seq)
100 loops, best of 3: 11.8 ms per loop

In [480]: %timeit search_sequence_cv2(arr,seq)
10 loops, best of 3: 20.6 ms per loop

看起来像基于Pure NumPy的是最安全,最快的!

Seems like the Pure NumPy based one is the safest and fastest!

这篇关于在NumPy数组中搜索序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆