相当于Oracle Lead/Lag函数的Pandas [英] Pandas equivalent of Oracle Lead/Lag function
问题描述
首先,我是熊猫的新手,但我已经爱上了它.我正在尝试实现与Oracle的滞后功能等效的功能.
First I'm new to pandas, but I'm already falling in love with it. I'm trying to implement the equivalent of the Lag function from Oracle.
假设您有此DataFrame:
Let's suppose you have this DataFrame:
Date Group Data
2014-05-14 09:10:00 A 1
2014-05-14 09:20:00 A 2
2014-05-14 09:30:00 A 3
2014-05-14 09:40:00 A 4
2014-05-14 09:50:00 A 5
2014-05-14 10:00:00 B 1
2014-05-14 10:10:00 B 2
2014-05-14 10:20:00 B 3
2014-05-14 10:30:00 B 4
如果这是一个oracle数据库,而我想创建一个按组"列分组并按日期排序的滞后函数,则我可以轻松地使用此函数:
If this was an oracle database and I wanted to create a lag function grouped by the "Group" column and ordered by the Date I could easily use this function:
LAG(Data,1,NULL) OVER (PARTITION BY Group ORDER BY Date ASC) AS Data_lagged
这将产生下表:
Date Group Data Data lagged
2014-05-14 09:10:00 A 1 Null
2014-05-14 09:20:00 A 2 1
2014-05-14 09:30:00 A 3 2
2014-05-14 09:40:00 A 4 3
2014-05-14 09:50:00 A 5 4
2014-05-14 10:00:00 B 1 Null
2014-05-14 10:10:00 B 2 1
2014-05-14 10:20:00 B 3 2
2014-05-14 10:30:00 B 4 3
在熊猫中,我可以将日期设置为索引并使用shift方法:
In pandas I can set the date to be an index and use the shift method:
db["Data_lagged"] = db.Data.shift(1)
唯一的问题是,它不会按列分组.即使将日期"和组"两列设置为索引,我仍然会在滞后"列中获得"5".
The only issue is that this doesn't group by a column. Even if I set the two columns Date and Group as indexes, I would still get the "5" in the lagged column.
有没有办法在Pandas中实现等效的Lead和lag功能?
Is there a way to implement the equivalent of the Lead and lag functions in Pandas?
推荐答案
您可以执行要获得ORDER BY Date ASC
效果,必须首先对DataFrame进行排序:
To obtain the ORDER BY Date ASC
effect, you must sort the DataFrame first:
df['Data_lagged'] = (df.sort_values(by=['Date'], ascending=True)
.groupby(['Group'])['Data'].shift(1))
这篇关于相当于Oracle Lead/Lag函数的Pandas的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!