在Python中使用向前和向后填充窗口进行分组和重新采样 [英] Groupby and resample using forward and backward fill in window in Python
问题描述
我想在分组 df <时使用
1min
的频率使用正向填充 ffill
和向后填充 bfill
重新采样数据列/code>通过 id
列.
I want to resample data column using forward fill ffill
and backward fill bfill
at the frequency of 1min
while grouping df
by id
column.
df
:
id timestamp data
1 1 2017-01-02 13:14:53.040 10.0
2 1 2017-01-02 16:04:43.240 11.0
...
4 2 2017-01-02 15:22:06.540 1.0
5 2 2017-01-03 13:55:34.240 2.0
...
我用过:
pd.DataFrame(df.set_index('timestamp').groupby('id', sort=True)['data'].resample('1min').ffill().bfill())
如何通过从现在起过去10天的窗口内重新采样来添加其他条件?因此,现在是最后一个 timestamp
读数,而第一个 timestamp
读数是 datetime.datetime.now()-pd.to_timedelta("10day").目标是使每个 id
组的读数均相同.
How can I add an additional condition, by resampling within the window of past 10 days from now? So the last timestamp
reading is now and the first timestamp
reading is datetime.datetime.now() - pd.to_timedelta("10day"). The goal is to have the same number of readings for each id
group.
更新:
尝试:
start = datetime.datetime.now() - pd.to_timedelta("10day")
end = datetime.datetime.now()
r = pd.to_datetime(pd.date_range(start=start, end=end, freq='1h'))
pd.DataFrame(df.reset_index().set_index('timestamp').groupby('id', sort=True).reindex(r)['data'].resample('1h').ffill().bfill())
并返回:
AttributeError: 'DataFrameGroupBy' object has no attribute 'reindex'
所以我不应该对 groupby
对象应用 reindex
,有什么办法可以解决它?
so I'm not supposed to apply reindex
for groupby
object, is there a way that I can work around it?
推荐答案
没有数据,我无法真正进行测试.因此,将其作为对格式正确的建议/评论.由于您希望使用 bfill/ffill
重新采样,因此我认为 merge_asof
会起作用:
Without a data, I can't really test this. So take this as a suggestion/comment put for proper formatting. Since you are looking to resample with bfill/ffill
, I think merge_asof
would work:
# common time window
r = pd.to_datetime(pd.date_range(start='2017-12-23', end='2017-01-02 23:00:00', freq='1h'))
# unique id
unique_ids = df['id'].unique()
# new time reference:
new_df = pd.DataFrame({'id': np.repeat(unique_ids, len(r)),
'time': np.tile(r, len(unique_ids)),
})
# merge_asof may complain about sorting key, then sort both df by time
# default of merge_asof is `direction='backward'`
# change to `direction='forward'` if you want to *floor* time
out = pd.merge_asof(new_df, df, on='time', by='id')
这篇关于在Python中使用向前和向后填充窗口进行分组和重新采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!