pandas :为由groupby标识的每个组分配一个索引 [英] Pandas: assign an index to each group identified by groupby
问题描述
使用groupby()时,如何创建一个带有包含组号索引的新列的DataFrame,类似于R中的dplyr::group_indices
.例如,如果我有
When using groupby(), how can I create a DataFrame with a new column containing an index of the group number, similar to dplyr::group_indices
in R. For example, if I have
>>> df=pd.DataFrame({'a':[1,1,1,2,2,2],'b':[1,1,2,1,1,2]})
>>> df
a b
0 1 1
1 1 1
2 1 2
3 2 1
4 2 1
5 2 2
我如何获得像这样的DataFrame
How can I get a DataFrame like
a b idx
0 1 1 1
1 1 1 1
2 1 2 2
3 2 1 3
4 2 1 3
5 2 2 4
(idx
索引的顺序无关紧要)
(the order of the idx
indexes doesn't matter)
推荐答案
以下是使用drop_duplicates
和merge
来获得唯一标识符的简洁方法.
Here's a concise way using drop_duplicates
and merge
to get a unique identifier.
group_vars = ['a','b']
df.merge( df.drop_duplicates( group_vars ).reset_index(), on=group_vars )
a b index
0 1 1 0
1 1 1 0
2 1 2 2
3 2 1 3
4 2 1 3
5 2 2 5
在这种情况下,标识符变为0、2、3、5(只是原始索引的残差),但是可以通过附加reset_index(drop=True)
轻松地将其更改为0、1、2、3.
The identifier in this case goes 0,2,3,5 (just a residual of original index) but this could be easily changed to 0,1,2,3 with an additional reset_index(drop=True)
.
更新:较新版本的Pandas(0.20.2)提供了一种更简单的方法来使用ngroup
方法执行此操作,如@Constantino对上述问题的评论以及后续答案中所述.通过@CalumYou.我将把它留在这里作为替代方法,但是ngroup
在大多数情况下似乎是更好的方法.
Update: Newer versions of pandas (0.20.2) offer a simpler way to do this with the ngroup
method as noted in a comment to the question above by @Constantino and a subsequent answer by @CalumYou. I'll leave this here as an alternate approach but ngroup
seems like the better way to do this in most cases.
这篇关于 pandas :为由groupby标识的每个组分配一个索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!