查找一个数组与另一个数组中所有值的最接近索引-Python/NumPy [英] Find nearest indices for one array against all values in another array - Python / NumPy
问题描述
我有一个复数列表,我想在另一个复数列表中找到最接近的值.
I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.
我目前使用numpy的方法:
import numpy as np
refArray = np.random.random(16);
myArray = np.random.random(1000);
def find_nearest(array, value):
idx = (np.abs(array-value)).argmin()
return idx;
for value in np.nditer(myArray):
index = find_nearest(refArray, value);
print(index);
不幸的是,这需要花费大量的时间. 是否有更快或更"pythonian"的方式将myArray中的每个值与refArray中最接近的值进行匹配?
Unfortunately, this takes ages for a large amount of values. Is there a faster or more "pythonian" way of matching each value in myArray to the closest value in refArray?
仅供参考::我的脚本中不一定需要numpy.
FYI: I don't necessarily need numpy in my script.
重要:myArray和refArray的顺序都很重要,不应更改.如果要进行排序,则应以某种方式保留原始索引.
Important: the order of both myArray as well as refArray is important and should not be changed. If sorting is to be applied, the original index should be retained in some way.
推荐答案
这是一种使用 this post
-
Here's one vectorized approach with np.searchsorted
based on this post
-
def closest_argmin(A, B):
L = B.size
sidx_B = B.argsort()
sorted_B = B[sidx_B]
sorted_idx = np.searchsorted(sorted_B, A)
sorted_idx[sorted_idx==L] = L-1
mask = (sorted_idx > 0) & \
((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
return sidx_B[sorted_idx-mask]
简要说明:
-
获取左侧位置的排序索引.我们使用-
np.searchsorted(arr1, arr2, side='left')
或np.searchsorted(arr1, arr2)
来执行此操作.现在,searchsorted
希望将排序数组作为第一个输入,因此我们需要在那里做一些准备工作.
Get the sorted indices for the left positions. We do this with -
np.searchsorted(arr1, arr2, side='left')
or justnp.searchsorted(arr1, arr2)
. Now,searchsorted
expects sorted array as the first input, so we need some preparatory work there.
将这些左侧位置的值与它们的右侧位置(left + 1)
的值进行比较,看看哪一个最接近.我们在计算mask
的步骤中执行此操作.
Compare the values at those left positions with the values at their immediate right positions (left + 1)
and see which one is closest. We do this at the step that computes mask
.
根据左边的还是最右边的,选择相应的.这是通过将mask
值作为偏移量转换为ints
的索引减法来完成的.
Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the mask
values acting as the offsets being converted to ints
.
基准化
原始方法-
def org_app(myArray, refArray):
out1 = np.empty(myArray.size, dtype=int)
for i, value in enumerate(myArray):
# find_nearest from posted question
index = find_nearest(refArray, value)
out1[i] = index
return out1
时间和验证-
In [188]: refArray = np.random.random(16)
...: myArray = np.random.random(1000)
...:
In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop
In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop
In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True
50x+
可以加快发布的样本的速度,希望对更大的数据集可以提高速度!
50x+
speedup for the posted sample and hopefully more for larger datasets!
这篇关于查找一个数组与另一个数组中所有值的最接近索引-Python/NumPy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!