在Pandas数据框中的子组中对行进行排名的更快方法 [英] Faster way to rank rows in subgroups in pandas dataframe

查看:57
本文介绍了在Pandas数据框中的子组中对行进行排名的更快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由不同子组组成的熊猫数据框.

I have a pandas data frame that has is composed of different subgroups.

    df = pd.DataFrame({
    'id':[1, 2, 3, 4, 5, 6, 7, 8], 
    'group':['a', 'a', 'a', 'a', 'b', 'b', 'b', 'b'], 
    'value':[.01, .4, .2, .3, .11, .21, .4, .01]
    })

我想找到每个ID在其组中的排名,例如,值越低越好.在上面的示例中,在组A中,Id 1的排名为1,Id 2的排名为4.在组B中,Id 5的排名为2,在ID 8中的排名为1,因此在.

I want to find the rank of each id in its group with say, lower values being better. In the example above, in group A, Id 1 would have a rank of 1, Id 2 would have a rank of 4. In group B, Id 5 would have a rank of 2, Id 8 would have a rank of 1 and so on.

现在我通过以下方式评估排名:

Right now I assess the ranks by:

  1. 按值排序.

  1. Sorting by value.

df.sort('value', ascending = True, inplace=True)

创建一个排名函数(假定变量已经排序)

Create a ranker function (it assumes variables already sorted)

def ranker(df): df['rank'] = np.arange(len(df)) + 1 return df

def ranker(df): df['rank'] = np.arange(len(df)) + 1 return df

将排名功能分别应用于每个组:

Apply the ranker function on each group separately:

df = df.groupby(['group']).apply(ranker)

此过程有效,但是当我对数百万行的数据运行时,它确实很慢.是否有人对如何实现更快的排名功能有任何想法.

This process works but it is really slow when I run it on millions of rows of data. Does anyone have any ideas on how to make a faster ranker function.

推荐答案

rank被cythonized了,因此应该非常快.您可以传递与df.rank()相同的选项 此处是文档对于rank.如您所见,可以通过method参数以五种不同的方式之一来完成抢七.

rank is cythonized so should be very fast. And you can pass the same options as df.rank() here are the docs for rank. As you can see, tie-breaks can be done in one of five different ways via the method argument.

您也可能只需要该组的.cumcount().

Its also possible you simply want the .cumcount() of the group.

In [12]: df.groupby('group')['value'].rank(ascending=False)
Out[12]: 
0    4
1    1
2    3
3    2
4    3
5    2
6    1
7    4
dtype: float64

这篇关于在Pandas数据框中的子组中对行进行排名的更快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆