numpy的阵列显示唯一行 [英] Numpy array show only unique rows
问题描述
我想有一个阵列的行,这是唯一的。相反,numpy的的唯一
功能,我要排除所有的行,其中发生一次以上。
I want to have the rows of an array, which are unique. Contrary to numpy's unique
function, I want to exclude all rows, which occur more than once.
所以,输入:
[[1,1],[1,1],[1,2],[2,3],[3,4],[3,4]]
应导致输出
[[1,2],[2,3]].
我试图计算每一行与 np.unique(数组,return_counts = TRUE)外观
,并将结果与这些条目为> 1
。我在寻找既为更有效的方式来做到这一点,以及做同样的事情没有计数恢复,因为他们之前numpy的1.9没有实现。
I tried to count the appearance of each row with np.unique(array, return_counts=True)
and filter the result afterwards with those entries being >1
. I'm looking both for a more efficient way to do that, as well as doing the same thing without the counts returned, as they are implemented not before numpy 1.9.
更新:
在我的情况的数据大小始终是[米,2],但一旦概念建立的,它应该是很容易转移到[M,N]的情况下。在我的特殊情况下,该数据集包括整数,但解不必限于该假设。一个典型的数据集将有 M〜10 ^ 7
。
推荐答案
借助 numpy_indexed 包(免责声明:我是它的作者),能够有效地解决这个问题,在一个完全量化的方式。我havnt与numpy的尚未经过测试1.9,如果是仍然相关,但也许你们会有愿意给它一个旋转,让我知道。我没有任何理由相信它不会与旧版本的numpy的工作。
The numpy_indexed package (disclaimer: I am its author) is able to solve this problem efficiently, in a fully vectorized manner. I havnt tested with numpy yet 1.9, if that is still relevant, but perhaps youd be willing to give it a spin and let me know. I don't have any reason to believe it will not work with older versions of numpy.
a = np.random.rand(10000, 3).round(2)
unique, count = npi.count(a)
print(unique[count == 1])
请注意,按照你原来的问题,这个解决方案并不限于特定的列数,或DTYPE。
Note that as per your original question, this solution is not restricted to a specific number of columns, or dtype.
这篇关于numpy的阵列显示唯一行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!