基于多列创建滞后特征 [英] create lag features based on multiple columns
问题描述
我有一个时间序列数据集.我需要提取滞后特征.我正在使用下面的代码,但得到了所有的NAN
i have a time series dataset. i need to extract the lag features. i am using below code but got all NAN's
df.groupby(['week','id1','id2','id3'],as_index=False)['value'].shift(1)
输入
week,id1,id2,id3,value
1,101,123,001,45
1,102,231,004,89
1,203,435,099,65
2,101,123,001,48
2,102,231,004,75
2,203,435,099,90
输出
week,id1,id2,id3,value,t-1
1,101,123,001,45,NAN
1,102,231,004,89,NAN
1,203,435,099,65,NAN
2,101,123,001,48,45
2,102,231,004,75,89
2,203,435,099,90,65
推荐答案
您想转到下一周,因此从分组中删除'week'
:
You want to shift to the next week so remove 'week'
from the grouping:
df['t-1'] = df.groupby(['id1','id2','id3'],as_index=False)['value'].shift()
# week id1 id2 id3 value t-1
#0 1 101 123 1 45 NaN
#1 1 102 231 4 89 NaN
#2 1 203 435 99 65 NaN
#3 2 101 123 1 48 45.0
#4 2 102 231 4 75 89.0
#5 2 203 435 99 90 65.0
该错误容易导致缺少几周.在这种情况下,我们可以在更改星期之后合并,以确保无论丢失星期如何,它都是前一周.
That's error prone to missing weeks. In this case we can merge after changing the week, which ensures it is the prior week regardless of missing weeks.
df2 = df.assign(week=df.week+1).rename(columns={'value': 't-1'})
df = df.merge(df2, on=['week', 'id1', 'id2', 'id3'], how='left')
带来并重命名许多列的另一种方法是在合并中使用suffixes
参数.这将重命名右侧DataFrame中的所有重叠列(不是键).
Another way to bring and rename many columns would be to use the suffixes
argument in the merge. This will rename all overlapping columns (that are not keys) in the right DataFrame.
df.merge(df.assign(week=df.week+1), # Manally lag
on=['week', 'id1', 'id2', 'id3'],
how='left',
suffixes=['', '_lagged'] # Right df columns -> _lagged
)
# week id1 id2 id3 value value_lagged
#0 1 101 123 1 45 NaN
#1 1 102 231 4 89 NaN
#2 1 203 435 99 65 NaN
#3 2 101 123 1 48 45.0
#4 2 102 231 4 75 89.0
#5 2 203 435 99 90 65.0
这篇关于基于多列创建滞后特征的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!