查找唯一的列和列成员 [英] Find unique columns and column membership

查看:40
本文介绍了查找唯一的列和列成员的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经历了这些线程:

  • Find unique rows in numpy.array
  • Removing duplicates in each row of a numpy array
  • Pandas: unique dataframe

他们都讨论了几种计算具有唯一行和列的矩阵的方法.

and they all discuss several methods for computing the matrix with unique rows and columns.

但是,至少对于未经训练的人来说,解决方案看起来有些令人费解.例如,这是第一个线程的最佳解决方案(如果我错了,请更正我),我认为这是最安全,最快的:

However, the solutions look a bit convoluted, at least to the untrained eye. Here is for example top solution from the first thread, which (correct me if I am wrong) I believe it is the safest and fastest:

np.unique(a.view(np.dtype((np.void, a.dtype.itemsize*a.shape[1])))).view(a.dtype).reshape(-1, 
a.shape[1])

无论哪种方式,以上解决方案仅返回唯一行的矩阵.我正在寻找的是np.unique

Either way, the above solution only returns the matrix of unique rows. What I am looking for is something along the original functionality of np.unique

u, indices = np.unique(a, return_inverse=True)

它不仅返回唯一条目的列表,而且还返回找到的每个唯一条目的每个项目的成员资格,但是我该如何对列执行此操作?

which returns, not only the list of unique entries, but also the membership of each item to each unique entry found, but how can I do this for columns?

以下是我正在寻找的示例:

Here is an example of what I am looking for:

array([[0, 2, 0, 2, 2, 0, 2, 1, 1, 2],
       [0, 1, 0, 1, 1, 1, 2, 2, 2, 2]])

我们会:

u       = array([0,1,2,3,4])
indices = array([0,1,0,1,1,3,4,4,3])

u中的不同值表示原始数组中唯一列的集合:

Where the different values in u represent the set of unique columns in the original array:

0 -> [0,0]
1 -> [2,1]
2 -> [0,1]
3 -> [2,2]
4 -> [1,2]

推荐答案

本质上,您希望np.unique返回唯一列的索引以及它们的使用位置索引吗?通过转置矩阵,然后使用来自另一个问题的代码并添加return_inverse=True,很容易做到这一点.

Essentially, you want np.unique to return the indexes of the unique columns, and the indices of where they're used? This is easy enough to do by transposing the matrix and then using the code from the other question, with the addition of return_inverse=True.

at = a.T
b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1])))
_, u, indices = np.unique(b, return_index=True, return_inverse=True)

使用您的a,可以得到:

In [35]: u
Out[35]: array([0, 5, 7, 1, 6])

In [36]: indices
Out[36]: array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])

但是,对我来说,尚不清楚您想要u是什么.如果希望将其作为唯一列,则可以改用以下内容:

It's not entirely clear to me what you want u to be, however. If you want it to be the unique columns, then you could use the following instead:

at = a.T
b = np.ascontiguousarray(at).view(np.dtype((np.void, at.dtype.itemsize * at.shape[1])))
_, idx, indices = np.unique(b, return_index=True, return_inverse=True)
u = a[:,idx]

这会给

In [41]: u
Out[41]:
array([[0, 0, 1, 2, 2],
       [0, 1, 2, 1, 2]])

In [42]: indices
Out[42]: array([0, 3, 0, 3, 3, 1, 4, 2, 2, 4])

这篇关于查找唯一的列和列成员的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆