在公差范围内找到两个矩阵的交集? [英] Finding intersection of two matrices in Python within a tolerance?
问题描述
我正在寻找找到两个不同大小的矩阵相交的最有效方法.每个矩阵都有三个变量(列)和不同数量的观察值(行).例如,矩阵A:
I'm looking for the most efficient way of finding the intersection of two different-sized matrices. Each matrix has three variables (columns) and a varying number of observations (rows). For example, matrix A:
a = np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
b = np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003'; 9 9 3000; 7 7 1000')
如果我将每列的公差设置为col1 = 1
,col2 = 2
和col3 = 10
,则需要一个函数,使其输出a
和b
中位于它们各自范围内的索引公差,例如:
If I set the tolerance for each column as col1 = 1
, col2 = 2
, and col3 = 10
, I would want a function such that it would output the indices in a
and b
that are within their respective tolerance, for example:
[x1, x2] = func(a, b, col1, col2, col3)
print x1
>> [2 3]
print x2
>> [1 3]
通过索引可以看到,a
的元素2在b
的元素1的公差之内.
You can see by the indices, that element 2 of a
is within the tolerances of element 1 of b
.
我想我可以遍历矩阵a
的每个元素,检查它是否在b
中每个元素的公差范围内,然后这样做.但是对于非常大的数据集而言,效率似乎很低.
I'm thinking I could loop through each element of matrix a
, check if it's within the tolerances of each element in b
, and do it that way. But it seems inefficient for very large data sets.
对实现此目标的循环方法的替代方案有何建议?
Any suggestions for alternatives to a looping method for accomplishing this?
推荐答案
If you don't mind working with NumPy arrays, you could exploit broadcasting
for a vectorized solution. Here's the implementation -
# Set tolerance values for each column
tol = [1, 2, 10]
# Get absolute differences between a and b keeping their columns aligned
diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))
# Compare each row with the triplet from `tol`.
# Get mask of all matching rows and finally get the matching indices
x1,x2 = np.nonzero((diffs < tol).all(2))
样品运行-
In [46]: # Inputs
...: a=np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
...: b=np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003; 9 9 3000; 7 7 1000')
...:
In [47]: # Set tolerance values for each column
...: tol = [1, 2, 10]
...:
...: # Get absolute differences between a and b keeping their columns aligned
...: diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))
...:
...: # Compare each row with the triplet from `tol`.
...: # Get mask of all matching rows and finally get the matching indices
...: x1,x2 = np.nonzero((diffs < tol).all(2))
...:
In [48]: x1,x2
Out[48]: (array([2, 3]), array([1, 3]))
大数据量的情况::如果您正在使用会导致内存问题的大数据量,并且由于您已经知道列数很小,因此您可能希望使用最小的数据量.循环3
迭代并节省大量内存,就像这样-
Large datasizes case : If you are working with huge datasizes that cause memory issues and since you already know that the number of columns is a small number 3
, you might want to have a minimal loop of 3
iterations and save huge memory footprint, like so -
na = a.shape[0]
nb = b.shape[0]
accum = np.ones((na,nb),dtype=bool)
for i in range(a.shape[1]):
accum &= np.abs((a[:,i] - b[:,i].ravel())) < tol[i]
x1,x2 = np.nonzero(accum)
这篇关于在公差范围内找到两个矩阵的交集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!