pandas :为由groupby标识的每个组分配一个索引 [英] Pandas: assign an index to each group identified by groupby

查看:69
本文介绍了 pandas :为由groupby标识的每个组分配一个索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用groupby()时,如何创建一个带有包含组号索引的新列的DataFrame,类似于R中的dplyr::group_indices.例如,如果我有

When using groupby(), how can I create a DataFrame with a new column containing an index of the group number, similar to dplyr::group_indices in R. For example, if I have

>>> df=pd.DataFrame({'a':[1,1,1,2,2,2],'b':[1,1,2,1,1,2]})
>>> df
   a  b
0  1  1
1  1  1
2  1  2
3  2  1
4  2  1
5  2  2

我如何获得像这样的DataFrame

How can I get a DataFrame like

   a  b  idx
0  1  1  1
1  1  1  1
2  1  2  2
3  2  1  3
4  2  1  3
5  2  2  4

(idx索引的顺序无关紧要)

(the order of the idx indexes doesn't matter)

推荐答案

以下是使用drop_duplicatesmerge来获得唯一标识符的简洁方法.

Here's a concise way using drop_duplicates and merge to get a unique identifier.

group_vars = ['a','b']
df.merge( df.drop_duplicates( group_vars ).reset_index(), on=group_vars )

   a  b  index
0  1  1      0
1  1  1      0
2  1  2      2
3  2  1      3
4  2  1      3
5  2  2      5

在这种情况下,标识符变为0、2、3、5(只是原始索引的残差),但是可以通过附加reset_index(drop=True)轻松地将其更改为0、1、2、3.

The identifier in this case goes 0,2,3,5 (just a residual of original index) but this could be easily changed to 0,1,2,3 with an additional reset_index(drop=True).

更新:较新版本的Pandas(0.20.2)提供了一种更简单的方法来使用ngroup方法执行此操作,如@Constantino对上述问题的评论以及后续答案中所述.通过@CalumYou.我将把它留在这里作为替代方法,但是ngroup在大多数情况下似乎是更好的方法.

Update: Newer versions of pandas (0.20.2) offer a simpler way to do this with the ngroup method as noted in a comment to the question above by @Constantino and a subsequent answer by @CalumYou. I'll leave this here as an alternate approach but ngroup seems like the better way to do this in most cases.

这篇关于 pandas :为由groupby标识的每个组分配一个索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆