手动计算多列的平均排名 [英] Compute rank average for multiple columns manually

查看：43 发布时间：2021/6/13 20:49:28 python pandas rank

本文介绍了手动计算多列的平均排名的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在寻找一种方法来生成基于多列的平均值作为方法的排名，其中一个列包含字符串和其他整数(很容易超过 2 列，但为了更简单的示例，我将其限制为 2).

I am looking for a way to generate a ranking with average as method based on multiple columns where one contains strings and the other integers (could be easily more than 2 columns, but I'm limiting to 2 for an easier example).

import pandas as pd
df = pd.DataFrame(data={'String':['a','a','a','a','b','b','c','c','c','c'],'Integer':[1,2,3,3,1,3,6,4,4,4]})
print(df)
  String  Integer
0      a        1
1      a        2
2      a        3
3      a        3
4      b        1
5      b        3
6      c        6
7      c        4
8      c        4
9      c        4

这个想法是能够创建排名，按字符串降序排列每一行，整数按升序排列，这将是输出:

The idea is to be able to create ranking that ranks each row by String in descending order and integer in ascending order, this would be the output:

    Rank String  Integer
0      2      c        4           
1      2      c        4         
2      2      c        4          
3      4      c        6          
4      5      b        1          
5      6      b        3         
6      7      a        1            
7      8      a        2            
8    9.5      a        3            
9    9.5      a        3

到目前为止，这是我设法做的事情，但我在如何生成共享排名时的平均值"方面遇到了麻烦.

So far this is what I manage to do, but I'm having trouble on how to generate the 'average' when a rank is shared.

df['concat_values'] = df['String'] + df['Integer'].astype(str)  
df = df.sort_values(['String','Integer'],ascending=[False,True])
df = df.reset_index(drop=True).reset_index()
df['repeated'] = df.groupby('concat_values')['concat_values'].transform('count')
df['pre_rank'] = df['index'] + 1
df = df.sort_values('pre_rank')
df = df.drop('index',axis=1)
print(df)
  String  Integer concat_values  repeated  pre_rank
0      c        4            c4         3         1
1      c        4            c4         3         2
2      c        4            c4         3         3
3      c        6            c6         1         4
4      b        1            b1         1         5
5      b        3            b3         1         6
6      a        1            a1         1         7
7      a        2            a2         1         8
8      a        3            a3         2         9
9      a        3            a3         2        10

我想过使用一些过滤或公式，以便当 repeated 列的值大于 1 时，pre_rank 会应用一个返回平均值的函数，但是该函数不能对所有行推广，它适用于第一行，但它会为第二行产生更高的值(因为 pre_rank 现在具有更高的值).我相信我只是错过了完成它的最后一步，但无法解决.谢谢！

I thought of using some filtering or formula so that when the column repeated takes a value higher than one, the pre_rank gets a function applied that returns the average, but that function can't be generalized for all rows, it'll work for the first one, but it will yield a higher value for the second one (because pre_rank has a higher value now). I believe I am just missing the final step towards getting it done, but can't work it out. Thanks!

手动计算多列的平均排名 [英] Compute rank average for multiple columns manually

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

手动计算多列的平均排名 [英] Compute rank average for multiple columns manually

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭