查找一个数组与另一个数组中所有值的最接近索引-Python/NumPy [英] Find nearest indices for one array against all values in another array - Python / NumPy

查看:165
本文介绍了查找一个数组与另一个数组中所有值的最接近索引-Python/NumPy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个复数列表,我想在另一个复数列表中找到最接近的值.

I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.

我目前使用numpy的方法:

import numpy as np

refArray = np.random.random(16);
myArray = np.random.random(1000);


def find_nearest(array, value):
    idx = (np.abs(array-value)).argmin()
    return idx;

for value in np.nditer(myArray):
    index = find_nearest(refArray, value);
    print(index);

不幸的是,这需要花费大量的时间. 是否有更快或更"pythonian"的方式将myArray中的每个值与refArray中最接近的值进行匹配?

Unfortunately, this takes ages for a large amount of values. Is there a faster or more "pythonian" way of matching each value in myArray to the closest value in refArray?

仅供参考::我的脚本中不一定需要numpy.

FYI: I don't necessarily need numpy in my script.

重要:myArray和refArray的顺序都很重要,不应更改.如果要进行排序,则应以某种方式保留原始索引.

Important: the order of both myArray as well as refArray is important and should not be changed. If sorting is to be applied, the original index should be retained in some way.

推荐答案

这是一种使用 this post -

Here's one vectorized approach with np.searchsorted based on this post -

def closest_argmin(A, B):
    L = B.size
    sidx_B = B.argsort()
    sorted_B = B[sidx_B]
    sorted_idx = np.searchsorted(sorted_B, A)
    sorted_idx[sorted_idx==L] = L-1
    mask = (sorted_idx > 0) & \
    ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
    return sidx_B[sorted_idx-mask]

简要说明:

  • 获取左侧位置的排序索引.我们使用-np.searchsorted(arr1, arr2, side='left')np.searchsorted(arr1, arr2)来执行此操作.现在,searchsorted希望将排序数组作为第一个输入,因此我们需要在那里做一些准备工作.

  • Get the sorted indices for the left positions. We do this with - np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2). Now, searchsorted expects sorted array as the first input, so we need some preparatory work there.

将这些左侧位置的值与它们的右侧位置(left + 1)的值进行比较,看看哪一个最接近.我们在计算mask的步骤中执行此操作.

Compare the values at those left positions with the values at their immediate right positions (left + 1) and see which one is closest. We do this at the step that computes mask.

根据左边的还是最右边的,选择相应的.这是通过将mask值作为偏移量转换为ints的索引减法来完成的.

Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the mask values acting as the offsets being converted to ints.

基准化

原始方法-

def org_app(myArray, refArray):
    out1 = np.empty(myArray.size, dtype=int)
    for i, value in enumerate(myArray):
        # find_nearest from posted question
        index = find_nearest(refArray, value)
        out1[i] = index
    return out1

时间和验证-

In [188]: refArray = np.random.random(16)
     ...: myArray = np.random.random(1000)
     ...: 

In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop

In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop

In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True

50x+ 可以加快发布的样本的速度,希望对更大的数据集可以提高速度!

50x+ speedup for the posted sample and hopefully more for larger datasets!

这篇关于查找一个数组与另一个数组中所有值的最接近索引-Python/NumPy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆