在Python中找到两个大数组(矩阵)之间的集合差异 [英] Find the set difference between two large arrays (matrices) in Python

查看:549
本文介绍了在Python中找到两个大数组(矩阵)之间的集合差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个大的2维数组,我想以它们的行作为元素来查找它们的集合差异.在Matlab中,此代码为setdiff(A,B,'rows').数组足够大,以至于我可能想到的明显的循环方法花费的时间太长.

I have two large 2-d arrays and I'd like to find their set difference taking their rows as elements. In Matlab, the code for this would be setdiff(A,B,'rows'). The arrays are large enough that the obvious looping methods I could think of take too long.

推荐答案

可行,但由于正在创建的视图没有可用的mergesort,目前在1.6.1中已被打破.它适用于1.7.0的预发行版本.这应该是最快的方法,因为视图不必复制任何内存:

This should work, but is currently broken in 1.6.1 due to an unavailable mergesort for the view being created. It works in the pre-release 1.7.0 version. This should be the fastest way possible, since the views don't have to copy any memory:

>>> import numpy as np
>>> a1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a2 = np.array([[4,5,6],[7,8,9],[1,1,1]])
>>> a1_rows = a1.view([('', a1.dtype)] * a1.shape[1])
>>> a2_rows = a2.view([('', a2.dtype)] * a2.shape[1])
>>> np.setdiff1d(a1_rows, a2_rows).view(a1.dtype).reshape(-1, a1.shape[1])
array([[1, 2, 3]])

您可以在Python中执行此操作,但这可能会很慢:

You can do this in Python, but it might be slow:

>>> import numpy as np
>>> a1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> a2 = np.array([[4,5,6],[7,8,9],[1,1,1]])
>>> a1_rows = set(map(tuple, a1))
>>> a2_rows = set(map(tuple, a2))
>>> a1_rows.difference(a2_rows)
set([(1, 2, 3)])

这篇关于在Python中找到两个大数组(矩阵)之间的集合差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆