每个组在 pandas 中的排名顺序 [英] Ranking order per group in Pandas

查看:78
本文介绍了每个组在 pandas 中的排名顺序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑一个具有三列的数据框:group_IDitem_IDvalue.假设我们总共有10个itemIDs.

Consider a dataframe with three columns: group_ID, item_ID and value. Say we have 10 itemIDs total.

我需要基于value在每个group_ID中将每个item_ID(1到10)内进行排名,然后查看各组之间的平均排名(和其他统计信息)(例如各个组中值最高的ID的排名将接近1).我该怎么做 熊猫?

I need to rank each item_ID (1 to 10) within each group_ID based on value, and then see the mean rank (and other stats) across groups (e.g. the IDs with the highest value across groups would get ranks closer to 1). How can I do this in Pandas?

此答案qcut的作用非常接近,但不完全相同.

This answer does something very close with qcut, but not exactly the same.

数据示例如下:

      group_ID   item_ID  value
0   0S00A1HZEy        AB     10
1   0S00A1HZEy        AY      4
2   0S00A1HZEy        AC     35
3   0S03jpFRaC        AY     90
4   0S03jpFRaC        A5      3
5   0S03jpFRaC        A3     10
6   0S03jpFRaC        A2      8
7   0S03jpFRaC        A4      9
8   0S03jpFRaC        A6      2
9   0S03jpFRaC        AX      0

这将导致:

      group_ID   item_ID   rank
0   0S00A1HZEy        AB      2
1   0S00A1HZEy        AY      3
2   0S00A1HZEy        AC      1
3   0S03jpFRaC        AY      1
4   0S03jpFRaC        A5      5
5   0S03jpFRaC        A3      2
6   0S03jpFRaC        A2      4
7   0S03jpFRaC        A4      3
8   0S03jpFRaC        A6      6
9   0S03jpFRaC        AX      7

推荐答案

您可以将许多不同的参数传递给获得所需的结果:

There are lots of different arguments you can pass to rank; it looks like you can use rank("dense", ascending=False) to get the results you want, after doing a groupby:

>>> df["rank"] = df.groupby("group_ID")["value"].rank("dense", ascending=False)
>>> df
     group_ID item_ID  value  rank
0  0S00A1HZEy      AB     10     2
1  0S00A1HZEy      AY      4     3
2  0S00A1HZEy      AC     35     1
3  0S03jpFRaS      AY     90     1
4  0S03jpFRaS      A5      3     5
5  0S03jpFRaS      A3     10     2
6  0S03jpFRaS      A2      8     4
7  0S03jpFRaS      A4      9     3
8  0S03jpFRaS      A6      2     6
9  0S03jpFRaS      AX      0     7

但是请注意,如果您不使用全局排名方案,那么找出各组之间的平均排名就没有什么意义-除非组中存在重复的值(因此您具有重复的排名值)正在做的是测量一组中有多少个元素.

But note that if you're not using a global ranking scheme, finding out the mean rank across groups isn't very meaningful-- unless there are duplicate values in a group (and so you have duplicate rank values) all you're doing is measuring how many elements there are in a group.

这篇关于每个组在 pandas 中的排名顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆