使用for循环在两个数据集中找到最接近值的值 [英] Finding the closest to value in two datasets using a for loop
问题描述
在MATLAB中,我可以使用以下代码找到与data_a中的值最接近的data_b中的值,以及指示它们出现在矩阵中哪个位置的索引,以及这些代码:
clear all; close all; clc;
data_a = [0; 15; 30; 45; 60; 75; 90];
data_b = randi([0, 90], [180, 101]);
[rows_a,cols_a] = size(data_a);
[rows_b,cols_b] = size(data_b);
val1 = zeros(rows_a,cols_b);
ind1 = zeros(rows_a,cols_b);
for i = 1:cols_b
for j = 1:rows_a
[val1(j,i),ind1(j,i)] = min(abs(data_b(:,i) - data_a(j)));
end
end
由于我想逐步淘汰MATLAB(最终将不再获得许可),所以我决定在python中尝试相同的方法,但是没有任何运气:
import numpy as np
data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))
[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape
val1 = np.zeros((rows_a,cols_b))
ind1 = np.zeros((rows_a,cols_b))
for i in range(cols_b):
for j in range(rows_a):
[val1[j][i],ind1[j][i]] = np.amin(np.abs(data_b[:][i] - data_a[j]))
代码还产生了一个错误,使我变得更聪明:
TypeError: cannot unpack non-iterable numpy.int32 object
如果任何人都可以通过指出我做错了什么以及我可以做些什么来弥补我的愚昧无知,我将不胜感激,因为事实证明这已成为我前进的主要障碍. /p>
谢谢.
我认为您面临两个问题:
- 对多维数组切片使用不正确:使用
[i, j]
代替[i][j]
-
min()
从MATLAB到NumPy的不正确翻译:您必须同时使用argmin()
和min()
.
您的固定代码如下:
import numpy as np
# just to make it reproducible in testing, can be commented for production
np.random.seed(0)
data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))
[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape
val1 = np.zeros((rows_a,cols_b), dtype=int)
ind1 = np.zeros((rows_a,cols_b), dtype=int)
for i in range(cols_b):
for j in range(rows_a):
ind1[j, i] = np.argmin(np.abs(data_b[:, i] - data_a[j]))
val1[j, i] = np.min(np.abs(data_b[:, i] - data_a[j])[ind1[j, i]])
但是,我会避免在此处直接循环播放,并会充分利用广播:
import numpy as np
# just to make it reproducible in testing, can be commented for production
np.random.seed(0)
data_a = np.arange(0, 90 + 1, 15).reshape((-1, 1, 1))
data_b = np.random.randint(90 + 1, size=(1, 180, 101))
tmp_arr = np.abs(data_a.reshape(-1, 1, 1) - data_b.reshape(1, 180, -1), dtype=int)
min_idxs = np.argmin(tmp_arr, axis=1)
min_vals = np.min(tmp_arr, axis=1)
del tmp_arr # you can delete this if you no longer need it
现在是ind1 == min_idxs
和val1 == min_vals
,即:
print(np.all(min_idxs == ind1))
# True
print(np.all(min_vals == val1))
# True
In MATLAB, I am able to find to identify the values in data_b that come closest to the values in data_a, alongside the indices that indicate in which place in the matrix they occur, with the following code:
clear all; close all; clc;
data_a = [0; 15; 30; 45; 60; 75; 90];
data_b = randi([0, 90], [180, 101]);
[rows_a,cols_a] = size(data_a);
[rows_b,cols_b] = size(data_b);
val1 = zeros(rows_a,cols_b);
ind1 = zeros(rows_a,cols_b);
for i = 1:cols_b
for j = 1:rows_a
[val1(j,i),ind1(j,i)] = min(abs(data_b(:,i) - data_a(j)));
end
end
Since I would like to phase out MATLAB (I will be out of a license eventually), I decided to try the same in python, without any luck:
import numpy as np
data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))
[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape
val1 = np.zeros((rows_a,cols_b))
ind1 = np.zeros((rows_a,cols_b))
for i in range(cols_b):
for j in range(rows_a):
[val1[j][i],ind1[j][i]] = np.amin(np.abs(data_b[:][i] - data_a[j]))
The code also produced an error that made me none the wiser:
TypeError: cannot unpack non-iterable numpy.int32 object
If anyone could find time to explain why I am an ignorant fool by indicating what I did wrong, and what I could do to fix it, I would be grateful as this has proven to become a major obstacle for my progress.
Thank you.
I think you are facing two problems:
- Incorrect use of slicing for multidimensional arrays: use
[i, j]
instead of[i][j]
- Improper translation of
min()
from MATLAB to NumPy: you have to use bothargmin()
andmin()
.
Your fixed code would look like:
import numpy as np
# just to make it reproducible in testing, can be commented for production
np.random.seed(0)
data_a = np.array([[0],[15],[30],[45],[60],[75],[90]])
data_b = np.random.randint(91, size=(180, 101))
[rows_a,cols_a] = data_a.shape
[rows_b,cols_b] = data_b.shape
val1 = np.zeros((rows_a,cols_b), dtype=int)
ind1 = np.zeros((rows_a,cols_b), dtype=int)
for i in range(cols_b):
for j in range(rows_a):
ind1[j, i] = np.argmin(np.abs(data_b[:, i] - data_a[j]))
val1[j, i] = np.min(np.abs(data_b[:, i] - data_a[j])[ind1[j, i]])
However, I would avoid direct looping here and I would make good use of broadcasting:
import numpy as np
# just to make it reproducible in testing, can be commented for production
np.random.seed(0)
data_a = np.arange(0, 90 + 1, 15).reshape((-1, 1, 1))
data_b = np.random.randint(90 + 1, size=(1, 180, 101))
tmp_arr = np.abs(data_a.reshape(-1, 1, 1) - data_b.reshape(1, 180, -1), dtype=int)
min_idxs = np.argmin(tmp_arr, axis=1)
min_vals = np.min(tmp_arr, axis=1)
del tmp_arr # you can delete this if you no longer need it
where now ind1 == min_idxs
and val1 == min_vals
, i.e.:
print(np.all(min_idxs == ind1))
# True
print(np.all(min_vals == val1))
# True
这篇关于使用for循环在两个数据集中找到最接近值的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!