如何在大 pandas 中成对成对标记? [英] How to label groups of pairs in pandas?
问题描述
我有这个数据框:
>>> df = pd.DataFrame({'A': [1, 2, 1, np.nan, 2, 2, 2], 'B': [2, 1, 2, 2.0, 1, 1, 2]})
>>> df
A B
0 1.0 2.0
1 2.0 1.0
2 1.0 2.0
3 NaN 2.0
4 2.0 1.0
5 2.0 1.0
6 2.0 2.0
我需要在第三列"group id"上标识成对的组(A,B),以获得类似这样的信息:
I need to identify the groups of pairs (A,B) on a third column "group id", to get something like this:
>>> df
A B grup id explanation
0 1.0 2.0 1.0 <- group (1.0, 2.0), first group
1 2.0 1.0 2.0 <- group (2.0, 1.0), second group
2 1.0 2.0 1.0 <- group (1.0, 2.0), first group
3 NaN 2.0 NaN <- invalid group
4 2.0 1.0 2.0 <- group (2.0, 1.0), second group
5 2.0 1.0 2.0 <- group (2.0, 1.0), second group
6 2.0 2.0 3.0 <- group (2.0, 2.0), third group
如何在熊猫中有效地做到这一点?
How can I do this efficiently in pandas?
一个想法是先构建一个组合列(A,B),然后在该列中标识唯一值,然后将它们映射回我的数据框.但是我怀疑groupby()方法会更快(也更优雅).
One idea is to first build a combined column (A,B), then identify the unique values in that column and map them back to my dataframe. But I suspect that a groupby() approach would be faster (and more elegant).
我尝试过:
>>> df.groupby(['A','B']).count()
Empty DataFrame
Columns: []
Index: [(1.0, 2.0), (2.0, 1.0), (2.0, 2.0)]
因此,这个groupby()的索引列出了我需要的所有组.但是然后如何计算它们并将它们映射回我的数据框?
So the index of this groupby() lists all the groups I need. But then how to count them and map them back to my dataframe?
推荐答案
You can use GroupBy.ngroup
(pandas 0.20.2+):
print (df.groupby(['A','B']).ngroup())
0 0
1 1
2 0
3 -1
4 1
5 1
6 2
dtype: int64
df['grup id'] = df.groupby(['A','B']).ngroup().replace(-1,np.nan).add(1)
print (df)
A B grup id
0 1.0 2.0 1.0
1 2.0 1.0 2.0
2 1.0 2.0 1.0
3 NaN 2.0 NaN
4 2.0 1.0 2.0
5 2.0 1.0 2.0
6 2.0 2.0 3.0
类似于替换-1
并添加1
:
df['grup id'] = df.groupby(['A','B']).ngroup()
df['grup id'] = np.where(df['grup id'] == -1, np.nan, df['grup id'] + 1)
print (df)
A B grup id
0 1.0 2.0 1.0
1 2.0 1.0 2.0
2 1.0 2.0 1.0
3 NaN 2.0 NaN
4 2.0 1.0 2.0
5 2.0 1.0 2.0
6 2.0 2.0 3.0
对于最早版本的pandas
(波纹管0.20.2):
For oldiest versions of pandas
(bellow 0.20.2):
df['grup id'] = df.groupby(["A","B"]).grouper.group_info[0]
df['grup id'] = np.where(df['grup id'] == -1, np.nan, df['grup id'] + 1)
print (df)
A B grup id
0 1.0 2.0 1.0
1 2.0 1.0 2.0
2 1.0 2.0 1.0
3 NaN 2.0 NaN
4 2.0 1.0 2.0
5 2.0 1.0 2.0
6 2.0 2.0 3.0
这篇关于如何在大 pandas 中成对成对标记?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!