NumPy的:矢量找到最接近的值在数组中的另一个阵列中的每个元素 [英] NumPy: Vectorize finding closest value in an array for each element in another array

查看:1403
本文介绍了NumPy的:矢量找到最接近的值在数组中的另一个阵列中的每个元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

known_array :numpy的阵列;仅由标量值; 形状:(M,1)

known_array : numpy array; consisting of scalar values only; shape: (m, 1)

test_array :numpy的阵列;仅由标量值; 形状:(N,1)

test_array : numpy array; consisting of scalar values only; shape: (n, 1)

指数:numpy的阵列; 形状:(N,1);对于 test_array 的每个值中找到 known_array

indices : numpy array; shape: (n, 1); For each value in test_array finds the index of the closest value in known_array

残留:numpy的阵列; 形状:(N,1);对于 test_array 的每个值查找距离最近的值之差在 known_array

residual : numpy array; shape: (n, 1); For each value in test_array finds the difference from the closest value in known_array

In [17]: known_array = np.array([random.randint(-30,30) for i in range(5)])

In [18]: known_array
Out[18]: array([-24, -18, -13, -30,  29])

In [19]: test_array = np.array([random.randint(-10,10) for i in range(10)])

In [20]: test_array
Out[20]: array([-6,  4, -6,  4,  8, -4,  8, -6,  2,  8])

示例实现(不完全量化的)

def find_nearest(known_array, value):
    idx = (np.abs(known_array - value)).argmin()
    diff = known_array[idx] - value
    return [idx, -diff]

In [22]: indices = np.zeros(len(test_array))

In [23]: residual = np.zeros(len(test_array))

In [24]: for i in range(len(test_array)):
   ....:     [indices[i], residual[i]] =  find_nearest(known_array, test_array[i])
   ....:     

In [25]: indices
Out[25]: array([ 2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.,  2.])

In [26]: residual
Out[26]: array([  7.,  17.,   7.,  17.,  21.,   9.,  21.,   7.,  15.,  21.])

什么是加快这个任务的最佳方法是什么?用Cython是一种选择,但是,我总是preFER才能够删除循环,让code仍然是纯粹的NumPy的。

What is the best way to speed up this task? Cython is an option, but, I would always prefer to be able to remove the for loop and let the code remain are pure NumPy.

NB :下面的堆栈溢出的问题进行了磋商。

NB: Following Stack Overflow questions were consulted

  1. <一个href="http://stackoverflow.com/questions/6065697/python-numpy-quickly-find-the-index-in-an-array-closest-to-some-value">Python/Numpy - 快速查找索引在数组中最接近一定的价值
  2. 找到数字最接近的值指数
  3. 发现numpy的数组中最接近的数值
  4. <一个href="http://stackoverflow.com/questions/8914491/finding-the-nearest-value-and-return-the-index-of-array-in-python">Finding最近的值,并返回数组的索引在Python
  5. <一个href="http://stackoverflow.com/questions/15363419/finding-nearest-items-across-two-lists-arrays-in-python">finding在两个列表/阵列最近的项目在Python
  1. Python/Numpy - Quickly Find the Index in an Array Closest to Some Value
  2. Find the index of numerically closest value
  3. find nearest value in numpy array
  4. Finding the nearest value and return the index of array in Python
  5. finding nearest items across two lists/arrays in Python

我做了一些小的基准比较非矢量化和矢量化解决方案(接受的答案)。


Updates

I did some small benchmarks for comparing the non-vectorized and vectorized solution (accepted answer).

In [48]: [indices1, residual1] = find_nearest_vectorized(known_array, test_array)

In [53]: [indices2, residual2] = find_nearest_non_vectorized(known_array, test_array)

In [54]: indices1==indices2
Out[54]: array([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True],   dtype=bool)

In [55]: residual1==residual2
Out[55]: array([ True,  True,  True,  True,  True,  True,  True,  True,  True,  True], dtype=bool)

In [56]: %timeit [indices2, residual2] = find_nearest_non_vectorized(known_array, test_array)
10000 loops, best of 3: 173 µs per loop

In [57]: %timeit [indices1, residual1] = find_nearest_vectorized(known_array, test_array)
100000 loops, best of 3: 16.8 µs per loop

关于一个 10倍加速比!

known_array 未排序。

我跑的基准由在回答给定@cyborg下面。

I ran the benchmarks as given in the answer by @cyborg below.

案例1 :如果 known_array 分选

known_array = np.arange(0,1000)
test_array = np.random.randint(0, 100, 10000)
print('Speedups:')
base_time = time_f('base')
for func_name in ['diffs', 'searchsorted1', 'searchsorted2']:
    print func_name + ' is x%.1f faster than base.' % (base_time / time_f(func_name))
    assert np.allclose(base(known_array, test_array), eval(func_name+'(known_array, test_array)'))


Speedups:
diffs is x0.4 faster than base.
searchsorted1 is x81.3 faster than base.
searchsorted2 is x107.6 faster than base.

首先,对于大数组的diff 方法其实比较慢,这也消耗了大量的内存和我的系统挂,当我跑了实际数据。

Firstly, for large arrays diffs method is actually slower, it also eats up a lot of RAM and my system hanged when I ran it on actual data.

案例2 :在 known_array 未排序;从而重新presents实际情况,

Case 2 : When known_array is not sorted; which represents actual scenario

known_array = np.random.randint(0,100,100)
test_array = np.random.randint(0, 100, 100)


Speedups:
diffs is x8.9 faster than base.
AssertionError                            Traceback (most recent call last)
<ipython-input-26-3170078c217a> in <module>()
      5 for func_name in ['diffs', 'searchsorted1', 'searchsorted2']:
      6     print func_name + ' is x%.1f faster than base.' % (base_time /  time_f(func_name))
----> 7     assert np.allclose(base(known_array, test_array),  eval(func_name+'(known_array, test_array)'))

AssertionError: 


searchsorted1 is x14.8 faster than base.

我也要发表评论,该方法也应该是内存效率。否则,我的8 GB的RAM是不够的。在基本情况下,它是很容易足以

I must also comment that the approach should also be memory efficient. Otherwise my 8 GB of RAM is not sufficient. In the base case, it is easily sufficient.

推荐答案

例如,你可以在计算上的所有分歧去:

For example, you can compute all the differences in on go with:

differences = (test_array.reshape(1,-1) - known_array.reshape(-1,1))

和使用 argmin 并随着 np.diagonal 花哨的索引,以获得所需的指标和不同之处:

And use argmin and fancy indexing along with np.diagonal to get desired indices and differences:

indices = np.abs(differences).argmin(axis=0)
residual = np.diagonal(differences[indices,])

因此​​,对于

So for

>>> known_array = np.array([-24, -18, -13, -30,  29])
>>> test_array = np.array([-6,  4, -6,  4,  8, -4,  8, -6,  2,  8])

一赠

>>> indices
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
>>> residual
array([ 7, 17,  7, 17, 21,  9, 21,  7, 15, 21])

这篇关于NumPy的:矢量找到最接近的值在数组中的另一个阵列中的每个元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆