使用for循环在两个数据集中找到最接近值的值 [英] Finding the closest to value in two datasets using a for loop

查看:256
本文介绍了使用for循环在两个数据集中找到最接近值的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在MATLAB中,我可以使用以下代码找到与data_a中的值最接近的data_b中的值,以及指示它们出现在矩阵中哪个位置的索引,以及这些代码:

clear all; close all; clc;

data_a = [0; 15; 30; 45; 60; 75; 90];
data_b = randi([0, 90], [180, 101]);

[rows_a,cols_a] = size(data_a);  
[rows_b,cols_b] = size(data_b);

val1 = zeros(rows_a,cols_b);
ind1 = zeros(rows_a,cols_b);

for i = 1:cols_b
    for j = 1:rows_a
        [val1(j,i),ind1(j,i)] = min(abs(data_b(:,i) - data_a(j)));
    end
end

由于我想逐步淘汰MATLAB(最终将不再获得许可),所以我决定在python中尝试相同的方法,但是没有任何运气:

import numpy as np

data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))

[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape

val1 = np.zeros((rows_a,cols_b))
ind1 = np.zeros((rows_a,cols_b))

for i in range(cols_b):
    for j in range(rows_a):
        [val1[j][i],ind1[j][i]] = np.amin(np.abs(data_b[:][i] - data_a[j]))

代码还产生了一个错误,使我变得更聪明:

TypeError: cannot unpack non-iterable numpy.int32 object

如果任何人都可以通过指出我做错了什么以及我可以做些什么来弥补我的愚昧无知,我将不胜感激,因为事实证明这已成为我前进的主要障碍. /p>

谢谢.

解决方案

我认为您面临两个问题:

  1. 对多维数组切片使用不正确:使用[i, j]代替[i][j]
  2. min()从MATLAB到NumPy的不正确翻译:您必须同时使用argmin()min().

您的固定代码如下:

import numpy as np

# just to make it reproducible in testing, can be commented for production
np.random.seed(0)

data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))

[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape

val1 = np.zeros((rows_a,cols_b), dtype=int)
ind1 = np.zeros((rows_a,cols_b), dtype=int)

for i in range(cols_b):
    for j in range(rows_a):
        ind1[j, i] = np.argmin(np.abs(data_b[:, i] - data_a[j]))
        val1[j, i] = np.min(np.abs(data_b[:, i] - data_a[j])[ind1[j, i]])

但是,我会避免在此处直接循环播放,并会充分利用广播:

import numpy as np

# just to make it reproducible in testing, can be commented for production
np.random.seed(0)

data_a = np.arange(0, 90 + 1, 15).reshape((-1, 1, 1))
data_b = np.random.randint(90 + 1, size=(1, 180, 101))

tmp_arr = np.abs(data_a.reshape(-1, 1, 1) - data_b.reshape(1, 180, -1), dtype=int)
min_idxs = np.argmin(tmp_arr, axis=1)
min_vals = np.min(tmp_arr, axis=1)
del tmp_arr  # you can delete this if you no longer need it

现在是ind1 == min_idxsval1 == min_vals,即:

print(np.all(min_idxs == ind1))
# True
print(np.all(min_vals == val1))
# True

In MATLAB, I am able to find to identify the values in data_b that come closest to the values in data_a, alongside the indices that indicate in which place in the matrix they occur, with the following code:

clear all; close all; clc;

data_a = [0; 15; 30; 45; 60; 75; 90];
data_b = randi([0, 90], [180, 101]);

[rows_a,cols_a] = size(data_a);  
[rows_b,cols_b] = size(data_b);

val1 = zeros(rows_a,cols_b);
ind1 = zeros(rows_a,cols_b);

for i = 1:cols_b
    for j = 1:rows_a
        [val1(j,i),ind1(j,i)] = min(abs(data_b(:,i) - data_a(j)));
    end
end

Since I would like to phase out MATLAB (I will be out of a license eventually), I decided to try the same in python, without any luck:

import numpy as np

data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))

[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape

val1 = np.zeros((rows_a,cols_b))
ind1 = np.zeros((rows_a,cols_b))

for i in range(cols_b):
    for j in range(rows_a):
        [val1[j][i],ind1[j][i]] = np.amin(np.abs(data_b[:][i] - data_a[j]))

The code also produced an error that made me none the wiser:

TypeError: cannot unpack non-iterable numpy.int32 object

If anyone could find time to explain why I am an ignorant fool by indicating what I did wrong, and what I could do to fix it, I would be grateful as this has proven to become a major obstacle for my progress.

Thank you.

解决方案

I think you are facing two problems:

  1. Incorrect use of slicing for multidimensional arrays: use [i, j] instead of [i][j]
  2. Improper translation of min() from MATLAB to NumPy: you have to use both argmin() and min().

Your fixed code would look like:

import numpy as np

# just to make it reproducible in testing, can be commented for production
np.random.seed(0)

data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))

[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape

val1 = np.zeros((rows_a,cols_b), dtype=int)
ind1 = np.zeros((rows_a,cols_b), dtype=int)

for i in range(cols_b):
    for j in range(rows_a):
        ind1[j, i] = np.argmin(np.abs(data_b[:, i] - data_a[j]))
        val1[j, i] = np.min(np.abs(data_b[:, i] - data_a[j])[ind1[j, i]])

However, I would avoid direct looping here and I would make good use of broadcasting:

import numpy as np

# just to make it reproducible in testing, can be commented for production
np.random.seed(0)

data_a = np.arange(0, 90 + 1, 15).reshape((-1, 1, 1))
data_b = np.random.randint(90 + 1, size=(1, 180, 101))

tmp_arr = np.abs(data_a.reshape(-1, 1, 1) - data_b.reshape(1, 180, -1), dtype=int)
min_idxs = np.argmin(tmp_arr, axis=1)
min_vals = np.min(tmp_arr, axis=1)
del tmp_arr  # you can delete this if you no longer need it

where now ind1 == min_idxs and val1 == min_vals, i.e.:

print(np.all(min_idxs == ind1))
# True
print(np.all(min_vals == val1))
# True

这篇关于使用for循环在两个数据集中找到最接近值的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆