Python pandas 的排名/排序基于另一列,每列输入均不同 [英] Python pandas rank/sort based on another column that differs for each input
问题描述
我想根据前三个提出以下第四栏:
I would like to come up with the 4th column below based on the first three:
user job time Rank
A print 1559 2
A print 1540 2
A edit 1520 1
A edit 1523 1
A deliver 9717 3
B edit 1717 2
B edit 1716 2
B edit 1715 2
B deliver 1527 1
B deliver 1524 1
第4列中的排名对于每个用户(第1列)都是独立的.对于每个用户,我想根据第三列的值对第二列进行排名.例如.对于用户A,他/她有3个职位待定.因为编辑"的时间值最小,然后编辑下一个,则交付最大,因此这三个时间的等级分别是编辑","1",打印" -2和交付-3".
The ranking in the 4th columns is independent for each user (1st column). For each user, I would like to rank the second column based on the value of the 3rd column. Eg. for user A, s/he has three jobs to be ranks. Because the time value of 'edit' is the smallest and edit the next and deliver the largest, the ranking for the three is edit - 1, print - 2 and deliver -3.
我知道我应该从第一列开始groupby,但是以某种方式无法弄清楚如何根据第三行对第二列进行排名,而第三行的每一行都不相同.
I know I should start with groupby the first column, but somehow cannot figure how to rank the 2nd column based on the 3rd that's different for each row.
推荐答案
首先,分配一个新列,其中包含用户-作业对的最短时间:
First, assign a new column which contains the minimum time for user-job pairs:
df['min_time'] = df.groupby(['user', 'job'])['time'].transform('min')
然后按每个用户分组并对其进行排名:
Then group by each user and rank them:
df.groupby('user')['min_time'].rank(method='dense').astype(int)
Out:
0 2
1 2
2 1
3 1
4 3
5 2
6 2
7 2
8 1
9 1
Name: min_time, dtype: int64
这篇关于Python pandas 的排名/排序基于另一列,每列输入均不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!