在公差范围内找到两个矩阵的交集? [英] Finding intersection of two matrices in Python within a tolerance?

查看:102
本文介绍了在公差范围内找到两个矩阵的交集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找找到两个不同大小的矩阵相交的最有效方法.每个矩阵都有三个变量(列)和不同数量的观察值(行).例如,矩阵A:

I'm looking for the most efficient way of finding the intersection of two different-sized matrices. Each matrix has three variables (columns) and a varying number of observations (rows). For example, matrix A:

a = np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
b = np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003'; 9 9 3000; 7 7 1000')

如果我将每列的公差设置为col1 = 1col2 = 2col3 = 10,则需要一个函数,使其输出ab中位于它们各自范围内的索引公差,例如:

If I set the tolerance for each column as col1 = 1, col2 = 2, and col3 = 10, I would want a function such that it would output the indices in a and b that are within their respective tolerance, for example:

[x1, x2] = func(a, b, col1, col2, col3)
print x1
>> [2 3]
print x2
>> [1 3]

通过索引可以看到,a的元素2在b的元素1的公差之内.

You can see by the indices, that element 2 of a is within the tolerances of element 1 of b.

我想我可以遍历矩阵a的每个元素,检查它是否在b中每个元素的公差范围内,然后这样做.但是对于非常大的数据集而言,效率似乎很低.

I'm thinking I could loop through each element of matrix a, check if it's within the tolerances of each element in b, and do it that way. But it seems inefficient for very large data sets.

对实现此目标的循环方法的替代方案有何建议?

Any suggestions for alternatives to a looping method for accomplishing this?

推荐答案

如果您不介意使用NumPy数组,则可以利用

If you don't mind working with NumPy arrays, you could exploit broadcasting for a vectorized solution. Here's the implementation -

# Set tolerance values for each column
tol = [1, 2, 10]

# Get absolute differences between a and b keeping their columns aligned
diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))

# Compare each row with the triplet from `tol`.
# Get mask of all matching rows and finally get the matching indices
x1,x2 = np.nonzero((diffs < tol).all(2))

样品运行-

In [46]: # Inputs
    ...: a=np.matrix('1 5 1003; 2 4 1002; 4 3 1008; 8 1 2005')
    ...: b=np.matrix('7 9 1006; 4 4 1007; 7 7 1050; 8 2 2003; 9 9 3000; 7 7 1000')
    ...: 

In [47]: # Set tolerance values for each column
    ...: tol = [1, 2, 10]
    ...: 
    ...: # Get absolute differences between a and b keeping their columns aligned
    ...: diffs = np.abs(np.asarray(a[:,None]) - np.asarray(b))
    ...: 
    ...: # Compare each row with the triplet from `tol`.
    ...: # Get mask of all matching rows and finally get the matching indices
    ...: x1,x2 = np.nonzero((diffs < tol).all(2))
    ...: 

In [48]: x1,x2
Out[48]: (array([2, 3]), array([1, 3]))


大数据量的情况::如果您正在使用会导致内存问题的大数据量,并且由于您已经知道列数很小,因此您可能希望使用最小的数据量.循环3迭代并节省大量内存,就像这样-


Large datasizes case : If you are working with huge datasizes that cause memory issues and since you already know that the number of columns is a small number 3, you might want to have a minimal loop of 3 iterations and save huge memory footprint, like so -

na = a.shape[0]
nb = b.shape[0]
accum = np.ones((na,nb),dtype=bool)
for i in range(a.shape[1]):
    accum &=  np.abs((a[:,i] - b[:,i].ravel())) < tol[i]
x1,x2 = np.nonzero(accum)

这篇关于在公差范围内找到两个矩阵的交集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆