蟒蛇/ numpy的子数组的第一次出现 [英] Python/NumPy first occurrence of subarray

查看：148 发布时间：2016/5/30 21:40:14 python numpy arrays

本文介绍了蟒蛇/ numpy的子数组的第一次出现的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在Python或numpy的，哪些是要找出一个子数组中第一次出现的最好方法？

In Python or NumPy, what is the best way to find out the first occurrence of a subarray?

例如，我有

a = [1, 2, 3, 4, 5, 6]
b = [2, 3, 4]

什么是最快的方法（运行时明智），找出其中b的发生？我理解的字符串，这是一个列表或numpy的ndarray非常容易，但怎么样？

What is the fastest way (run-time-wise) to find out where b occurs in a? I understand for strings this is extremely easy, but what about for a list or numpy ndarray?

非常感谢！

将帖子我preFER的numpy的解决方案，因为从我的经验numpy的矢量比Python列表COM prehension快得多。同时，大数组是巨大的，所以我不想把它转换成字符串;这将是（太）长。

I prefer the numpy solution, since from my experience numpy vectorization is much faster than Python list comprehension. Meanwhile, the big array is huge, so I don't want to convert it into a string; that will be (too) long.

推荐答案

我假设你正在寻找一个特定numpy的解决方案，而不是一个简单的列表COM prehension或循环。一种办法是，使用滚动窗口技术来搜索窗口的适当的大小。这里的rolling_window功能：

I'm assuming you're looking for a numpy-specific solution, rather than a simple list comprehension or for loop. One approach might be to use the rolling window technique to search for windows of the appropriate size. Here's the rolling_window function:

>>> def rolling_window(a, size):
...     shape = a.shape[:-1] + (a.shape[-1] - size + 1, size)
...     strides = a.strides + (a. strides[-1],)
...     return numpy.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)
...

然后，你可以不喜欢

Then you could do something like

>>> a = numpy.arange(10)
>>> numpy.random.shuffle(a)
>>> a
array([7, 3, 6, 8, 4, 0, 9, 2, 1, 5])
>>> rolling_window(a, 3) == [8, 4, 0]
array([[False, False, False],
       [False, False, False],
       [False, False, False],
       [ True,  True,  True],
       [False, False, False],
       [False, False, False],
       [False, False, False],
       [False, False, False]], dtype=bool)

为了使这真的很有用，你必须使用，以减少它沿轴1 所有：

>>> numpy.all(rolling_window(a, 3) == [8, 4, 0], axis=1)
array([False, False, False,  True, False, False, False, False], dtype=bool)

然后，你可以使用，但是你会使用布尔数组。一个简单的方法来获得指标的：

Then you could use that however you'd use a boolean array. A simple way to get the index out:

>>> bool_indices = numpy.all(rolling_window(a, 3) == [8, 4, 0], axis=1)
>>> numpy.mgrid[0:len(bool_indices)][bool_indices]
array([3])

有关列表，你可以修改其中的一个<一href=\"http://stackoverflow.com/questions/6822725/rolling-or-sliding-window-iterator-in-python\">rolling窗口迭代器来使用类似的方法。

For lists you could adapt one of these rolling window iterators to use a similar approach.

有关的非常的大型阵列和子阵列，可以节省内存是这样的：

For very large arrays and subarrays, you could save memory like this:

>>> windows = rolling_window(a, 3)
>>> sub = [8, 4, 0]
>>> hits = numpy.ones((len(a) - len(sub) + 1,), dtype=bool)
>>> for i, x in enumerate(sub):
...     hits &= numpy.in1d(windows[:,i], [x])
... 
>>> hits
array([False, False, False,  True, False, False, False, False], dtype=bool)
>>> hits.nonzero()
(array([3]),)

在另一方面，这将可能是更慢。怎么也不是没有明确的测试要慢得多;看到的杰米答案有检查误报另一个内存节约选项。我想，这两种解决方案之间的速度差将在很大程度上依赖于输入的性质。

On the other hand, this will probably be slower. How much slower isn't clear without testing; see Jamie's answer for another memory-conserving option that has to check false positives. I imagine that the speed difference between these two solutions will depend heavily on the nature of the input.

这篇关于蟒蛇/ numpy的子数组的第一次出现的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

蟒蛇/ numpy的子数组的第一次出现 [英] Python/NumPy first occurrence of subarray

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

蟒蛇/ numpy的子数组的第一次出现 [英] Python/NumPy first occurrence of subarray

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭