以最新值进行抽样 [英] Sampling with the most recent value

查看:65
本文介绍了以最新值进行抽样的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

请考虑以下系列:

created_at
2014-01-27 21:50:05.040961    80000.00
2014-03-12 18:46:45.517968    79900.00
2014-09-05 20:54:17.991260    63605.31
2014-11-04 01:16:08.286631    64405.31
2014-11-04 01:17:26.398272    63605.31
2014-11-04 01:24:38.225306    64405.31
2014-11-13 19:32:14.273478    65205.31
Name: my_series, dtype: float64

我需要在一组特定的预定义天(例如,每天从2014-12-012014-12-07)对本系列进行抽样.对于每个这样的样本,我想从原始系列中获得最新最新值.

I need to sample this Series on a specific set of pre-defined days (e.g. every day from 2014-12-01 to 2014-12-07). On each such sample, I would like to get the most recent value available from the original Series.

我一直在查看 resample (另请参见

I have been looking at resample (see also this and this thread), since it looks like the right tool for the job. However, I don't have a good grasp of the function yet. Can resample be used for this? If so, how?

推荐答案

如果您首先定义了一组预定义的天数(在下面的示例中为days),则可以

If you first define the set of pre-defined days (days in my example below), you can reindex with that and specify the filling method ('ffill' will propagate last valid observation forward, so this means take most recent for a time series):

In [19]: s
Out[19]: 
time
2014-01-27 21:50:05.040961    80000.00
2014-03-12 18:46:45.517968    79900.00
2014-09-05 20:54:17.991260    63605.31
2014-11-04 01:16:08.286631    64405.31
2014-11-04 01:17:26.398272    63605.31
2014-11-04 01:24:38.225306    64405.31
2014-11-13 19:32:14.273478    65205.31
Name: my_series, dtype: float64

In [20]: days = pd.date_range('2014-12-01', '2014-12-07')

In [21]: s.reindex(days, method='ffill')
Out[21]: 
2014-12-01    65205.31
2014-12-02    65205.31
2014-12-03    65205.31
2014-12-04    65205.31
2014-12-05    65205.31
2014-12-06    65205.31
2014-12-07    65205.31
Freq: D, Name: my_series, dtype: float64

在这种情况下(您给出的示例日期),这给alle相同的值,因为对于所有日期,原始系列中的最新观测值都是相同的.

In this case (the example dates you gave), this gives alle the same values, as for all dates the most recent observation in the original series is the same.

如果您不想提供特定的集合,而只是原始系列开始到结束的所有日期,则可以使用resample达到相同的目的:

If you don't want to give a specific set, but just all dates from the start to end of the original Series, you can use resample do reach the same:

In [23]: s.resample('D', how='last', fill_method='ffill')
Out[23]: 
time
2014-01-27    80000
2014-01-28    80000
2014-01-29    80000
2014-01-30    80000
...
2014-11-10    64405.31
2014-11-11    64405.31
2014-11-12    64405.31
2014-11-13    65205.31
Freq: D, Name: my_series, Length: 291

这篇关于以最新值进行抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆