在 pandas df组中按日期对日期进行排序和排序 [英] Sorting and ranking by dates, on a group in a pandas df

查看:550
本文介绍了在 pandas df组中按日期对日期进行排序和排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从以下类型的数据框中,我希望能够对id字段的日期进行排序和排名:

From the following sort of dataframe I would like to be able to both sort and rank the id field on date:

df = pd.DataFrame({
'id':[1, 1, 2, 3, 3, 4, 5, 6,6,6,7,7], 
'value':[.01, .4, .2, .3, .11, .21, .4, .01, 3, .5, .8, .9],
'date':['10/01/2017 15:45:00','05/01/2017 15:56:00',
        '11/01/2017 15:22:00','06/01/2017 11:02:00','05/01/2017 09:37:00',
        '05/01/2017 09:55:00','05/01/2017 10:08:00','03/02/2017 08:55:00',
        '03/02/2017 09:15:00','03/02/2017 09:31:00','09/01/2017 15:42:00',
        '19/01/2017 16:34:00']})

根据日期有效地对每个id进行排名或索引.

to effectively rank or index, per id, based on date.

我用过

df.groupby('id')['date'].min()

这允许我提取第一个日期(尽管我不知道如何使用它来过滤出行),但是我可能并不总是需要第一个日期-有时它将是第二个或第三个日期,所以我需要生成带有日期索引的新列-结果如下:

which allows me to extract the first date (although I don't know how to use this to filter out the rows) but I might not always need the first date - sometimes it will be the second or third so I need to generate a new column, with an index for the date - the result would look like:

关于此排序/排名/标签有什么想法吗?

Any ideas on this sorting/ranking/labelling?

我的原始模型忽略了一个非常普遍的问题.

My original model ignored a very prevalent issue.

由于可能有一些id对其并行执行多个测试,因此它们在日期库中显示为多行,并且具有匹配的日期(date对应于它们记录的时间).这些应该算作同一日期,而不要增加date_rank:我已经生成了一个模型,并更新了date_rank来演示其外观:

As there are feasibly some ids that have multiple tests performed on them in parallel, therefore they show in multiple rows in the datebase, with matching dates (date corresponds to when they were logged). These should be counted as the same date and not increment the date_rank: I've generated a model, with updated date_rank to demonstrate how this would look:

df = pd.DataFrame({
'id':[1, 1, 1, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6,6,6,7,7], 
'value':[.01, .4, .5, .7, .77, .1,.2, 0.3, .11, .21, .4, .01, 3, .5, .8, .9, .1],
'date':['10/01/2017 15:45:00','10/01/2017 15:45:00','05/01/2017 15:56:00',
        '11/01/2017 15:22:00','11/01/2017 15:22:00','06/01/2017 11:02:00','05/01/2017 09:37:00','05/01/2017 09:37:00','05/01/2017 09:55:00',
        '05/01/2017 09:55:00','05/01/2017 10:08:00','05/01/2017 10:09:00','03/02/2017 08:55:00',
        '03/02/2017 09:15:00','03/02/2017 09:31:00','09/01/2017 15:42:00',
        '19/01/2017 16:34:00']})

而柜台会负担得起:

推荐答案

您可以尝试按降序和汇总"id"组值的顺序对日期值进行排序

You can try of sorting date values in descending and aggregating the 'id' group values

@praveen的逻辑非常简单,通过扩展逻辑,您可以使用类别的类型将值转换为类别,并可以检索该类别的代码(键),但是与您的预期有点不同输出

@praveen's logic is very simpler, by extending of logic, you can use astype of category to convert the values to categories and can retrive the codes (keys') of that categories, but it will be little bit different to your expected output

df1 = df.sort_values(['id', 'date'], ascending=[True, False])
df1['date_rank'] =df1.groupby(['id']).apply(lambda x: x['date'].astype('category',ordered=False).cat.codes+1).values

出局:

                 date   id  value   date_rank
0   10/01/2017 15:45:00 1   0.01    2
1   10/01/2017 15:45:00 1   0.40    2
2   05/01/2017 15:56:00 1   0.50    1
3   11/01/2017 15:22:00 2   0.70    1
4   11/01/2017 15:22:00 2   0.77    1
5   06/01/2017 11:02:00 3   0.10    2
6   05/01/2017 09:37:00 3   0.20    1
7   05/01/2017 09:37:00 3   0.30    1
8   05/01/2017 09:55:00 4   0.11    1
9   05/01/2017 09:55:00 4   0.21    1
11  05/01/2017 10:09:00 5   0.01    2
10  05/01/2017 10:08:00 5   0.40    1
14  03/02/2017 09:31:00 6   0.80    3
13  03/02/2017 09:15:00 6   0.50    2
12  03/02/2017 08:55:00 6   3.00    1
16  19/01/2017 16:34:00 7   0.10    2
15  09/01/2017 15:42:00 7   0.90    1

但是要获得您的准确输出,在这里我使用了字典,并在提取值的同时反转了字典键

but to get your exact output, here i have used dictionary and reversing dictionary keys with extracting values

df1 = df.sort_values(['id', 'date'], ascending=[True, False])
df1['date_rank'] = df1.groupby(['id'])['date'].transform(lambda x: list(map(lambda y: dict(map(reversed, dict(enumerate(x.unique())).items()))[y]+1,x)) )

出局:

                date    id  value   date_rank
0   10/01/2017 15:45:00 1   0.01    1
1   10/01/2017 15:45:00 1   0.40    1
2   05/01/2017 15:56:00 1   0.50    2
3   11/01/2017 15:22:00 2   0.70    1
4   11/01/2017 15:22:00 2   0.77    1
5   06/01/2017 11:02:00 3   0.10    1
6   05/01/2017 09:37:00 3   0.20    2
7   05/01/2017 09:37:00 3   0.30    2
8   05/01/2017 09:55:00 4   0.11    1
9   05/01/2017 09:55:00 4   0.21    1
11  05/01/2017 10:09:00 5   0.01    1
10  05/01/2017 10:08:00 5   0.40    2
14  03/02/2017 09:31:00 6   0.80    1
13  03/02/2017 09:15:00 6   0.50    2
12  03/02/2017 08:55:00 6   3.00    3
16  19/01/2017 16:34:00 7   0.10    1
15  09/01/2017 15:42:00 7   0.90    2

这篇关于在 pandas df组中按日期对日期进行排序和排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆