Python/Numpy:按通用元素对数组行进行分组 [英] Python/Numpy: Grouping array-rows by a common element

查看:397
本文介绍了Python/Numpy:按通用元素对数组行进行分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个像这样的数组:

Suppose I have an array like this:

[[1,2,3]
 [0,4,2]
 [4,2,5] 
 [6,1,1]
 [1,3,5]
 [3,0,1]
 [0,4,2]]

我想通过使具有与其他行相同的元素的任何行属于同一类别来对数组的行进行分类.通常,数组不仅可以包含整数,而且可以是任何浮点数.要求元素必须在同一位置达成一致.对于上述数组,类别将为

I want to categorize the rows of the array by letting any row that have an element in common with some other row belong to the same category. In general, the arrays may not only consist of integers but could be any float. It is a requirement that the elements must agree in the same position. For the above array, the categories would be

[[0],
 [1],
 [0],
 [2],
 [0],
 [2],
 [1]]

注意:每个类别中的每个成员应与同一类别中的其他成员在同一位置共享一个公共号码.并非同一类别中的所有成对成员都需要在同一位置共享同一号码.

Note: Every member in each category should share a common number at common position with AT LEAST ONE other member in the same category. Not all pairs of members in the same category need to share common number at common position.

您能想到一个可靠的方法吗?

Can you think of a solid way to do this?

推荐答案

这为您提供了常见的行对.其余的取决于您对我的评论的回答,一旦我正确理解问题,我将对其进行更新:

This gives you the common pairs of rows. The rest depend on your answer to my comment which I will update once I understand question correctly:

pairs = np.argwhere(((a[:,None]-a)==0).any(axis=2))

更新 :根据类别的注释定义,将返回类别:

UPDATE: according to comments definition of category, this will return categories:

b = np.arange(a.shape[0])
for pair in pairs:
  b[np.flatnonzero(b==b[pair[1]])] = b[pair[0]]
b = b - b.min()

您可以通过在for循环之前预先从中去除自身边缘和重复边缘(每个边缘有两个)来使此过程更快.

You can probably make this faster by previously removing self-edges and duplicate edges (there is two of each edge) from pairs prior to the for loop.

输出:

[0 2 0 1 0 1 2]

类别映射名称与所讨论的输出不同,但是类别相同.如果您希望使用不同的名称,只需将代码的最后一行 b = b-b.min()更改为所需的命名即可.

category mapping name is different than output in question, but categories are the same. If you wish to name them differently, simply change last line of code b = b - b.min() to your desired naming.

另一种方法是使用边列表 pairs 创建图形并提取连接的组件.

Another approach is to use edgelist pairs to create a graph and extract the connected components.

这篇关于Python/Numpy:按通用元素对数组行进行分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆