pandas 按多列排名 [英] Pandas rank by multiple columns

查看:94
本文介绍了 pandas 按多列排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图基于两列对熊猫数据框进行排名. 我可以基于一列对其进行排名,但是如何基于两列对其进行排名呢? "SaleCount",然后是"TotalRevenue"?

I am trying to rank a pandas data frame based on two columns. I can rank it based on one column, but how can to rank it based on two columns? 'SaleCount', then 'TotalRevenue'?

import pandas as pd

df = pd.DataFrame({'TotalRevenue':[300,9000,1000,750,500,2000,0,600,50,500],
    'Date':['2016-12-02' for i in range(10)],
    'SaleCount':[10,100,30,35,20,100,0,30,2,20],
    'shops':['S3','S2','S1','S5','S4','S8','S6','S7','S9','S10']})

df['Rank'] = df.SaleCount.rank(method='dense',ascending = False).astype(int)

#df['Rank'] = df.TotalRevenue.rank(method='dense',ascending = False).astype(int)
df.sort_values(['Rank'], inplace=True)

print(df)

当前输出:

    Date        SaleCount   TotalRevenue    shops   Rank
1   2016-12-02  100          9000            S2      1
5   2016-12-06  100          2000            S8      1
3   2016-12-04  35           750             S5      2
2   2016-12-03  30           1000            S1      3
7   2016-12-08  30           600             S7      3
9   2016-12-10  20           500             S10     4
4   2016-12-05  20           500             S4      4
0   2016-12-01  10           300             S3      5
8   2016-12-09  2            50              S9      6
6   2016-12-07  0            0               S6      7

我正在尝试生成这样的输出:

I'm trying to generate an output like this:

    Date        SaleCount   TotalRevenue    shops   Rank
1   2016-12-02  100          9000            S2      1
5   2016-12-02  100          2000            S8      2
3   2016-12-02  35           750             S5      3
2   2016-12-02  30           1000            S1      4
7   2016-12-02  30           600             S7      5
9   2016-12-02  20           500             S10     6
4   2016-12-02  20           500             S4      6
0   2016-12-02  10           300             S3      7
8   2016-12-02  2            50              S9      8
6   2016-12-02  0            0               S6      9

推荐答案

另一种方法是将两个感兴趣的列都类型转换为str,并通过串联将它们组合在一起.将这些值转换回数值,以便可以根据其大小来区分它们.

Another way would be to type-cast both the columns of interest to str and combine them by concatenating them. Convert these back to numerical values so that they could be differentiated based on their magnitude.

method=dense中,重复值的等级将保持不变. (在这里:6)

In method=dense, ranks of duplicated values would remain unchanged. (Here: 6)

由于您要按降序对它们进行排名,因此请在ascending=False rel ="noreferrer"> Series.rank() 将使您获得所需的结果.

Since you want to rank these in their descending order, specifying ascending=False in Series.rank() would let you achieve the desired result.

col1 = df["SaleCount"].astype(str) 
col2 = df["TotalRevenue"].astype(str)
df['Rank'] = (col1+col2).astype(int).rank(method='dense', ascending=False).astype(int)
df.sort_values('Rank')

这篇关于 pandas 按多列排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆