基于多列创建滞后特征 [英] create lag features based on multiple columns

查看:76
本文介绍了基于多列创建滞后特征的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个时间序列数据集.我需要提取滞后特征.我正在使用下面的代码,但得到了所有的NAN

i have a time series dataset. i need to extract the lag features. i am using below code but got all NAN's

df.groupby(['week','id1','id2','id3'],as_index=False)['value'].shift(1)

输入

week,id1,id2,id3,value
1,101,123,001,45
1,102,231,004,89
1,203,435,099,65
2,101,123,001,48
2,102,231,004,75
2,203,435,099,90

输出

week,id1,id2,id3,value,t-1
1,101,123,001,45,NAN
1,102,231,004,89,NAN
1,203,435,099,65,NAN
2,101,123,001,48,45
2,102,231,004,75,89
2,203,435,099,90,65

推荐答案

您想转到下一周,因此从分组中删除'week':

You want to shift to the next week so remove 'week' from the grouping:

df['t-1'] = df.groupby(['id1','id2','id3'],as_index=False)['value'].shift()
#    week  id1  id2  id3  value   t-1
#0     1  101  123    1     45   NaN
#1     1  102  231    4     89   NaN
#2     1  203  435   99     65   NaN
#3     2  101  123    1     48  45.0
#4     2  102  231    4     75  89.0
#5     2  203  435   99     90  65.0


该错误容易导致缺少几周.在这种情况下,我们可以在更改星期之后合并,以确保无论丢失星期如何,它都是前一周.


That's error prone to missing weeks. In this case we can merge after changing the week, which ensures it is the prior week regardless of missing weeks.

df2 = df.assign(week=df.week+1).rename(columns={'value': 't-1'})
df = df.merge(df2, on=['week', 'id1', 'id2', 'id3'], how='left')


带来并重命名许多列的另一种方法是在合并中使用suffixes参数.这将重命名右侧DataFrame中的所有重叠列(不是键).


Another way to bring and rename many columns would be to use the suffixes argument in the merge. This will rename all overlapping columns (that are not keys) in the right DataFrame.

df.merge(df.assign(week=df.week+1),         # Manally lag
         on=['week', 'id1', 'id2', 'id3'], 
         how='left',
         suffixes=['', '_lagged']           # Right df columns -> _lagged
         )
#   week  id1  id2  id3  value  value_lagged
#0     1  101  123    1     45           NaN
#1     1  102  231    4     89           NaN
#2     1  203  435   99     65           NaN
#3     2  101  123    1     48          45.0
#4     2  102  231    4     75          89.0
#5     2  203  435   99     90          65.0

这篇关于基于多列创建滞后特征的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆