在Python中使用向前和向后填充窗口进行分组和重新采样 [英] Groupby and resample using forward and backward fill in window in Python

查看:135
本文介绍了在Python中使用向前和向后填充窗口进行分组和重新采样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在分组 df <时使用 1min 的频率使用正向填充 ffill 和向后填充 bfill 重新采样数据列/code>通过 id 列.

I want to resample data column using forward fill ffill and backward fill bfill at the frequency of 1min while grouping df by id column.

df :

          id   timestamp                data  

      1    1   2017-01-02 13:14:53.040  10.0
      2    1   2017-01-02 16:04:43.240  11.0  
                           ...
      4    2   2017-01-02 15:22:06.540   1.0  
      5    2   2017-01-03 13:55:34.240   2.0  
                           ...

我用过:

pd.DataFrame(df.set_index('timestamp').groupby('id', sort=True)['data'].resample('1min').ffill().bfill())

如何通过从现在起过去10天的窗口内重新采样来添加其他条件?因此,现在是最后一个 timestamp 读数,而第一个 timestamp 读数是 datetime.datetime.now()-pd.to_timedelta("10day").目标是使每个 id 组的读数均相同.

How can I add an additional condition, by resampling within the window of past 10 days from now? So the last timestamp reading is now and the first timestamp reading is datetime.datetime.now() - pd.to_timedelta("10day"). The goal is to have the same number of readings for each id group.

更新:

尝试:

start = datetime.datetime.now() - pd.to_timedelta("10day")
end = datetime.datetime.now()

r = pd.to_datetime(pd.date_range(start=start, end=end, freq='1h'))

pd.DataFrame(df.reset_index().set_index('timestamp').groupby('id', sort=True).reindex(r)['data'].resample('1h').ffill().bfill())

并返回:

AttributeError: 'DataFrameGroupBy' object has no attribute 'reindex'

所以我不应该对 groupby 对象应用 reindex ,有什么办法可以解决它?

so I'm not supposed to apply reindex for groupby object, is there a way that I can work around it?

推荐答案

没有数据,我无法真正进行测试.因此,将其作为对格式正确的建议/评论.由于您希望使用 bfill/ffill 重新采样,因此我认为 merge_asof 会起作用:

Without a data, I can't really test this. So take this as a suggestion/comment put for proper formatting. Since you are looking to resample with bfill/ffill, I think merge_asof would work:

# common time window
r = pd.to_datetime(pd.date_range(start='2017-12-23', end='2017-01-02 23:00:00', freq='1h')) 

# unique id
unique_ids = df['id'].unique()

# new time reference:
new_df = pd.DataFrame({'id': np.repeat(unique_ids, len(r)),
                       'time': np.tile(r, len(unique_ids)),
                      })

# merge_asof may complain about sorting key, then sort both df by time
# default of merge_asof is `direction='backward'`
# change to `direction='forward'` if you want to *floor* time
out = pd.merge_asof(new_df, df, on='time', by='id')
                   

这篇关于在Python中使用向前和向后填充窗口进行分组和重新采样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆