如何在大 pandas 中成对成对标记? [英] How to label groups of pairs in pandas?

查看:53
本文介绍了如何在大 pandas 中成对成对标记?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个数据框:

>>> df = pd.DataFrame({'A': [1, 2, 1, np.nan, 2, 2, 2], 'B': [2, 1, 2, 2.0, 1, 1, 2]})
>>> df
     A    B
0  1.0  2.0
1  2.0  1.0
2  1.0  2.0
3  NaN  2.0
4  2.0  1.0
5  2.0  1.0
6  2.0  2.0

我需要在第三列"group id"上标识成对的组(A,B),以获得类似这样的信息:

I need to identify the groups of pairs (A,B) on a third column "group id", to get something like this:

>>> df
     A    B  grup id                        explanation
0  1.0  2.0      1.0  <- group (1.0, 2.0), first group 
1  2.0  1.0      2.0  <- group (2.0, 1.0), second group
2  1.0  2.0      1.0  <- group (1.0, 2.0), first group 
3  NaN  2.0      NaN  <- invalid group                 
4  2.0  1.0      2.0  <- group (2.0, 1.0), second group
5  2.0  1.0      2.0  <- group (2.0, 1.0), second group
6  2.0  2.0      3.0  <- group (2.0, 2.0), third group 

如何在熊猫中有效地做到这一点?

How can I do this efficiently in pandas?

一个想法是先构建一个组合列(A,B),然后在该列中标识唯一值,然后将它们映射回我的数据框.但是我怀疑groupby()方法会更快(也更优雅).

One idea is to first build a combined column (A,B), then identify the unique values in that column and map them back to my dataframe. But I suspect that a groupby() approach would be faster (and more elegant).

我尝试过:

>>> df.groupby(['A','B']).count()
Empty DataFrame
Columns: []
Index: [(1.0, 2.0), (2.0, 1.0), (2.0, 2.0)]

因此,这个groupby()的索引列出了我需要的所有组.但是然后如何计算它们并将它们映射回我的数据框?

So the index of this groupby() lists all the groups I need. But then how to count them and map them back to my dataframe?

推荐答案

您可以使用

You can use GroupBy.ngroup (pandas 0.20.2+):

print (df.groupby(['A','B']).ngroup())
0    0
1    1
2    0
3   -1
4    1
5    1
6    2
dtype: int64

df['grup id'] = df.groupby(['A','B']).ngroup().replace(-1,np.nan).add(1)
print (df)
     A    B  grup id
0  1.0  2.0      1.0
1  2.0  1.0      2.0
2  1.0  2.0      1.0
3  NaN  2.0      NaN
4  2.0  1.0      2.0
5  2.0  1.0      2.0
6  2.0  2.0      3.0

类似于替换-1并添加1:

df['grup id'] = df.groupby(['A','B']).ngroup()
df['grup id'] = np.where(df['grup id'] == -1, np.nan, df['grup id'] + 1)
print (df)
     A    B  grup id
0  1.0  2.0      1.0
1  2.0  1.0      2.0
2  1.0  2.0      1.0
3  NaN  2.0      NaN
4  2.0  1.0      2.0
5  2.0  1.0      2.0
6  2.0  2.0      3.0

对于最早版本的pandas(波纹管0.20.2):

For oldiest versions of pandas (bellow 0.20.2):

df['grup id'] = df.groupby(["A","B"]).grouper.group_info[0]
df['grup id'] = np.where(df['grup id'] == -1, np.nan, df['grup id'] + 1)
print (df)
     A    B  grup id
0  1.0  2.0      1.0
1  2.0  1.0      2.0
2  1.0  2.0      1.0
3  NaN  2.0      NaN
4  2.0  1.0      2.0
5  2.0  1.0      2.0
6  2.0  2.0      3.0

这篇关于如何在大 pandas 中成对成对标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆