在 NumPy 数组中搜索序列 [英] Searching a sequence in a NumPy array

查看:22
本文介绍了在 NumPy 数组中搜索序列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有以下数组:

 array([2, 0, 0, 1, 0, 1, 0, 0])

如何获取出现值序列的索引:[0,0]?因此,这种情况的预期输出将是:[1,2,6,7].

How do I get the indices where I have occurrence of sequence of values : [0,0]? So, the expected output for such a case would be : [1,2,6,7].

1) 请注意 [0,0] 只是一个序列.它可以是 [0,0,0][4,6,8,9][5,2,0],随便什么.

1) Please note that [0,0] is just a sequence. It could be [0,0,0] or [4,6,8,9] or [5,2,0], just anything.

2) 如果我的数组被修改为:array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0]),与预期结果相同[0,0] 的序列将是 [1,2,3,4,8,9].

2) If my array were modified to : array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0]), the expected result with the same sequence of [0,0] would be [1,2,3,4,8,9].

我正在寻找一些 NumPy 快捷方式.

I am looking for some NumPy shortcut.

推荐答案

嗯,这基本上是一个 模板匹配问题 在图像处理中经常出现.这篇博文中列出了两种方法:基于纯 NumPy 和基于 OpenCV (cv2).

Well, this is basically a template-matching problem that comes up in image-processing a lot. Listed in this post are two approaches: Pure NumPy based and OpenCV (cv2) based.

方法 #1: 使用 NumPy,可以创建一个 2D 滑动索引数组,该数组在输入数组的整个长度上.因此,每一行都是一个元素的滑动窗口.接下来,将每一行与输入序列匹配,这将引入 broadcasting 用于矢量化解决方案.我们查找所有 True 行,表明那些是完美匹配的行,因此将是匹配的起始索引.最后,使用这些索引,创建一个扩展到序列长度的索引范围,为我们提供所需的输出.实施将是 -

Approach #1: With NumPy, one can create a 2D array of sliding indices across the entire length of the input array. Thus, each row would be a sliding window of elements. Next, match up each row with the input sequence, which will bring in broadcasting for a vectorized solution. We look for all True rows indicating those are the ones that are the perfect matches and as such would be the starting indices of the matches. Finally, using those indices, create a range of indices extending up to the length of the sequence, to give us the desired output. The implementation would be -

def search_sequence_numpy(arr,seq):
    """ Find sequence in an array using NumPy only.

    Parameters
    ----------    
    arr    : input 1D array
    seq    : input 1D array

    Output
    ------    
    Output : 1D Array of indices in the input array that satisfy the 
    matching of input sequence in the input array.
    In case of no match, an empty list is returned.
    """

    # Store sizes of input array and sequence
    Na, Nseq = arr.size, seq.size

    # Range of sequence
    r_seq = np.arange(Nseq)

    # Create a 2D array of sliding indices across the entire length of input array.
    # Match up with the input sequence & get the matching starting indices.
    M = (arr[np.arange(Na-Nseq+1)[:,None] + r_seq] == seq).all(1)

    # Get the range of those indices as final output
    if M.any() >0:
        return np.where(np.convolve(M,np.ones((Nseq),dtype=int))>0)[0]
    else:
        return []         # No match found

方法 #2: 使用 OpenCV (cv2),我们有一个用于 模板匹配 的内置函数:cv2.matchTemplate.使用它,我们将拥有起始匹配索引.其余步骤与之前的方法相同.这是 cv2 的实现:

Approach #2: With OpenCV (cv2), we have a built-in function for template-matching : cv2.matchTemplate. Using this, we would have the starting matching indices. Rest of the steps would be same as for the previous approach. Here's the implementation with cv2 :

from cv2 import matchTemplate as cv2m

def search_sequence_cv2(arr,seq):
    """ Find sequence in an array using cv2.
    """

    # Run a template match with input sequence as the template across
    # the entire length of the input array and get scores.
    S = cv2m(arr.astype('uint8'),seq.astype('uint8'),cv2.TM_SQDIFF)

    # Now, with floating point array cases, the matching scores might not be 
    # exactly zeros, but would be very small numbers as compared to others.
    # So, for that use a very small to be used to threshold the scorees 
    # against and decide for matches.
    thresh = 1e-5 # Would depend on elements in seq. So, be careful setting this.

    # Find the matching indices
    idx = np.where(S.ravel() < thresh)[0]

    # Get the range of those indices as final output
    if len(idx)>0:
        return np.unique((idx[:,None] + np.arange(seq.size)).ravel())
    else:
        return []         # No match found

<小时>

样品运行

In [512]: arr = np.array([2, 0, 0, 0, 0, 1, 0, 1, 0, 0])

In [513]: seq = np.array([0,0])

In [514]: search_sequence_numpy(arr,seq)
Out[514]: array([1, 2, 3, 4, 8, 9])

In [515]: search_sequence_cv2(arr,seq)
Out[515]: array([1, 2, 3, 4, 8, 9])

运行时测试

In [477]: arr = np.random.randint(0,9,(100000))
     ...: seq = np.array([3,6,8,4])
     ...: 

In [478]: np.allclose(search_sequence_numpy(arr,seq),search_sequence_cv2(arr,seq))
Out[478]: True

In [479]: %timeit search_sequence_numpy(arr,seq)
100 loops, best of 3: 11.8 ms per loop

In [480]: %timeit search_sequence_cv2(arr,seq)
10 loops, best of 3: 20.6 ms per loop

似乎基于 Pure NumPy 的那个是最安全和最快的!

Seems like the Pure NumPy based one is the safest and fastest!

这篇关于在 NumPy 数组中搜索序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆