根据另一个数组中的所有值查找一个数组的最近索引 - Python/NumPy [英] Find nearest indices for one array against all values in another array - Python / NumPy

查看:32
本文介绍了根据另一个数组中的所有值查找一个数组的最近索引 - Python/NumPy的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个复数列表,我想在另一个复数列表中找到最接近的值.

I have a list of complex numbers for which I want to find the closest value in another list of complex numbers.

我目前使用 numpy 的方法:

import numpy as np

refArray = np.random.random(16);
myArray = np.random.random(1000);


def find_nearest(array, value):
    idx = (np.abs(array-value)).argmin()
    return idx;

for value in np.nditer(myArray):
    index = find_nearest(refArray, value);
    print(index);

不幸的是,对于大量值,这需要很长时间.是否有更快或更pythonian"的方式将 myArray 中的每个值与 refArray 中的最接近值进行匹配?

Unfortunately, this takes ages for a large amount of values. Is there a faster or more "pythonian" way of matching each value in myArray to the closest value in refArray?

仅供参考:我的脚本中不一定需要 numpy.

FYI: I don't necessarily need numpy in my script.

重要提示: myArray 和 refArray 的顺序很重要,不应更改.如果要应用排序,应该以某种方式保留原始索引.

Important: the order of both myArray as well as refArray is important and should not be changed. If sorting is to be applied, the original index should be retained in some way.

推荐答案

这是一种带有 np.searchsorted 基于这篇文章代码> -

Here's one vectorized approach with np.searchsorted based on this post -

def closest_argmin(A, B):
    L = B.size
    sidx_B = B.argsort()
    sorted_B = B[sidx_B]
    sorted_idx = np.searchsorted(sorted_B, A)
    sorted_idx[sorted_idx==L] = L-1
    mask = (sorted_idx > 0) & \
    ((np.abs(A - sorted_B[sorted_idx-1]) < np.abs(A - sorted_B[sorted_idx])) )
    return sidx_B[sorted_idx-mask]

简要说明:

  • 获取左侧位置的排序索引.我们使用 - np.searchsorted(arr1, arr2, side='left') 或只是 np.searchsorted(arr1, arr2) 来做到这一点.现在,searchsorted 需要排序数组作为第一个输入,所以我们需要在那里做一些准备工作.

  • Get the sorted indices for the left positions. We do this with - np.searchsorted(arr1, arr2, side='left') or just np.searchsorted(arr1, arr2). Now, searchsorted expects sorted array as the first input, so we need some preparatory work there.

将那些左边位置的值与其紧邻右边位置的值进行比较(left + 1),看看哪一个最接近.我们在计算 mask 的步骤中执行此操作.

Compare the values at those left positions with the values at their immediate right positions (left + 1) and see which one is closest. We do this at the step that computes mask.

根据左边的还是最右边的,选择各自的.这是通过使用 mask 值作为被转换为 ints 的偏移量减去索引来完成的.

Based on whether the left ones or their immediate right ones are closest, choose the respective ones. This is done with the subtraction of indices with the mask values acting as the offsets being converted to ints.

基准测试

原始方法 -

def org_app(myArray, refArray):
    out1 = np.empty(myArray.size, dtype=int)
    for i, value in enumerate(myArray):
        # find_nearest from posted question
        index = find_nearest(refArray, value)
        out1[i] = index
    return out1

时间和验证 -

In [188]: refArray = np.random.random(16)
     ...: myArray = np.random.random(1000)
     ...: 

In [189]: %timeit org_app(myArray, refArray)
100 loops, best of 3: 1.95 ms per loop

In [190]: %timeit closest_argmin(myArray, refArray)
10000 loops, best of 3: 36.6 µs per loop

In [191]: np.allclose(closest_argmin(myArray, refArray), org_app(myArray, refArray))
Out[191]: True

50x+ 已发布样本的加速,希望更大的数据集更快!

50x+ speedup for the posted sample and hopefully more for larger datasets!

这篇关于根据另一个数组中的所有值查找一个数组的最近索引 - Python/NumPy的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆