在具有重复数据的数据框中,按年份对 pandas 进行分组,按销售额列进行排名 [英] pandas group by year, rank by sales column, in a dataframe with duplicate data

查看:71
本文介绍了在具有重复数据的数据框中,按年份对 pandas 进行分组,按销售额列进行排名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想创建一个年度排名(因此,在2012年,经理B为1.在2011年,经理B又为1).我在熊猫排名函数上苦苦挣扎了一段时间,不想诉诸于for循环.

I would like to create a rank on year (so in year 2012, Manager B is 1. In 2011, Manager B is 1 again). I struggled with the pandas rank function for awhile and DO NOT want to resort to a for loop.

s = pd.DataFrame([['2012','A',3],['2012','B',8],['2011','A',20],['2011','B',30]], columns=['Year','Manager','Return'])

Out[1]:     
   Year Manager  Return    
0  2012       A       3    
1  2012       B       8    
2  2011       A      20    
3  2011       B      30


我遇到的问题是附加代码(以前不认为这是相关的):


The issue I'm having is with the additional code (didn't think this would be relevant before):

s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])

s = s.append(b)
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)

raise Exception('Reindexing only valid with uniquely valued Index '
Exception: Reindexing only valid with uniquely valued Index objects

有什么想法吗?
这是我正在使用的实际数据结构. 重新索引时遇到麻烦.

Any ideas?
This is the real data structure I am using. Been having trouble re-indexing..

推荐答案

听起来您想按Year分组,然后按降序对Returns进行排名.

It sounds like you want to group by the Year, then rank the Returns in descending order.

import pandas as pd
s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]],
                 columns=['Year', 'Manager', 'Return'])
s['Rank'] = s.groupby(['Year'])['Return'].rank(ascending=False)
print(s)

收益

   Year Manager  Return  Rank
0  2012       A       3     2
1  2012       B       8     1
2  2011       A      20     2
3  2011       B      30     1


要解决OP的修订问题:错误消息


To address the OP's revised question: The error message

ValueError: cannot reindex from a duplicate axis

尝试对索引中具有重复值的DataFrame进行groupby/rank时发生

.您可以通过在附加后将s构造为具有唯一的索引值来避免此问题:

occurs when trying to groupby/rank on a DataFrame with duplicate values in the index. You can avoid the problem by constructing s to have unique index values after appending:

s = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
b = pd.DataFrame([['2012', 'A', 3], ['2012', 'B', 8], ['2011', 'A', 20], ['2011', 'B', 30]], columns=['Year', 'Manager', 'Return'])
s = s.append(b, ignore_index=True)

收益

   Year Manager  Return
0  2012       A       3
1  2012       B       8
2  2011       A      20
3  2011       B      30
4  2012       A       3
5  2012       B       8
6  2011       A      20
7  2011       B      30


如果您已经使用来添加新行


If you've already appended new rows using

s = s.append(b)

然后使用reset_index创建唯一索引:

then use reset_index to create a unique index:

s = s.reset_index(drop=True)

这篇关于在具有重复数据的数据框中,按年份对 pandas 进行分组,按销售额列进行排名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆