pandas 按另一列中的值对一列进行排序 [英] pandas sort a column by values in another column

查看:83
本文介绍了 pandas 按另一列中的值对一列进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据集,我想对它进行排序和分配排名.

I have a dataset that I want to sort and assign rank based on it.

假设它有两列,一列是年份,另一列是我要排序的列.

Suppose it has two columns, one is year and the other is the column that I want to sort.

import pandas as pd
data = {'year': pd.Series([2006, 2006, 2007, 2007]), 
        'value': pd.Series([5, 10, 4, 1])}
df = pd.DataFrame(data)

我想按每年对值"列进行排序,然后对其进行排名.我想拥有的是

I want to sort the column 'value' by each year and then give rank to it. What I would like to have is

data2= {'year': pd.Series([2006, 2006, 2007, 2007]), 
        'value': pd.Series([10, 5, 4, 1]),  
        'rank': pd.Series([1, 2, 1, 2]}
df2=pd.DataFrame(data2)

>>> df2
   rank  value  year
0     1     10  2006
1     2      5  2006
2     1      4  2007
3     2      1  2007

推荐答案

您可以先使用groupby,然后再使用rank(使用ascending=False首先获得最大值).您无需在groupby中进行排序,因为结果将索引到数据帧(性能稍快).

You can use groupby and then use rank (with ascending=False to get the largest values first). You don't need to sort in the groupby, as the result is indexed to the dataframe (slightly faster performance).

df['yearly_rank'] = df.groupby('year', sort=False)['value'].rank(ascending=False)

>>> df.sort_values(['year', 'yearly_rank'])
   value  year  yearly_rank
1     10  2006            1
0      5  2006            2
2      4  2007            1
3      1  2007            2

这篇关于 pandas 按另一列中的值对一列进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆