相当于Oracle Lead/Lag函数的Pandas [英] Pandas equivalent of Oracle Lead/Lag function

查看:85
本文介绍了相当于Oracle Lead/Lag函数的Pandas的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我是熊猫的新手,但我已经爱上了它.我正在尝试实现与Oracle的滞后功能等效的功能.

First I'm new to pandas, but I'm already falling in love with it. I'm trying to implement the equivalent of the Lag function from Oracle.

假设您有此DataFrame:

Let's suppose you have this DataFrame:

Date                   Group      Data
2014-05-14 09:10:00        A         1
2014-05-14 09:20:00        A         2
2014-05-14 09:30:00        A         3
2014-05-14 09:40:00        A         4
2014-05-14 09:50:00        A         5
2014-05-14 10:00:00        B         1
2014-05-14 10:10:00        B         2
2014-05-14 10:20:00        B         3
2014-05-14 10:30:00        B         4

如果这是一个oracle数据库,而我想创建一个按组"列分组并按日期排序的滞后函数,则我可以轻松地使用此函数:

If this was an oracle database and I wanted to create a lag function grouped by the "Group" column and ordered by the Date I could easily use this function:

 LAG(Data,1,NULL) OVER (PARTITION BY Group ORDER BY Date ASC) AS Data_lagged

这将产生下表:

Date                   Group     Data    Data lagged
2014-05-14 09:10:00        A        1           Null
2014-05-14 09:20:00        A        2            1
2014-05-14 09:30:00        A        3            2
2014-05-14 09:40:00        A        4            3
2014-05-14 09:50:00        A        5            4
2014-05-14 10:00:00        B        1           Null
2014-05-14 10:10:00        B        2            1
2014-05-14 10:20:00        B        3            2
2014-05-14 10:30:00        B        4            3

在熊猫中,我可以将日期设置为索引并使用shift方法:

In pandas I can set the date to be an index and use the shift method:

db["Data_lagged"] = db.Data.shift(1)

唯一的问题是,它不会按列分组.即使将日期"和组"两列设置为索引,我仍然会在滞后"列中获得"5".

The only issue is that this doesn't group by a column. Even if I set the two columns Date and Group as indexes, I would still get the "5" in the lagged column.

有没有办法在Pandas中实现等效的Lead和lag功能?

Is there a way to implement the equivalent of the Lead and lag functions in Pandas?

推荐答案

您可以执行要获得ORDER BY Date ASC效果,必须首先对DataFrame进行排序:

To obtain the ORDER BY Date ASC effect, you must sort the DataFrame first:

df['Data_lagged'] = (df.sort_values(by=['Date'], ascending=True)
                       .groupby(['Group'])['Data'].shift(1))

这篇关于相当于Oracle Lead/Lag函数的Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆