以最新值进行抽样 [英] Sampling with the most recent value
问题描述
请考虑以下系列:
created_at
2014-01-27 21:50:05.040961 80000.00
2014-03-12 18:46:45.517968 79900.00
2014-09-05 20:54:17.991260 63605.31
2014-11-04 01:16:08.286631 64405.31
2014-11-04 01:17:26.398272 63605.31
2014-11-04 01:24:38.225306 64405.31
2014-11-13 19:32:14.273478 65205.31
Name: my_series, dtype: float64
我需要在一组特定的预定义天(例如,每天从2014-12-01
到2014-12-07
)对本系列进行抽样.对于每个这样的样本,我想从原始系列中获得最新最新值.
I need to sample this Series on a specific set of pre-defined days (e.g. every day from 2014-12-01
to 2014-12-07
). On each such sample, I would like to get the most recent value available from the original Series.
I have been looking at resample
(see also this and this thread), since it looks like the right tool for the job. However, I don't have a good grasp of the function yet.
Can resample
be used for this? If so, how?
推荐答案
如果您首先定义了一组预定义的天数(在下面的示例中为days
),则可以
If you first define the set of pre-defined days (days
in my example below), you can reindex with that and specify the filling method ('ffill' will propagate last valid observation forward, so this means take most recent for a time series):
In [19]: s
Out[19]:
time
2014-01-27 21:50:05.040961 80000.00
2014-03-12 18:46:45.517968 79900.00
2014-09-05 20:54:17.991260 63605.31
2014-11-04 01:16:08.286631 64405.31
2014-11-04 01:17:26.398272 63605.31
2014-11-04 01:24:38.225306 64405.31
2014-11-13 19:32:14.273478 65205.31
Name: my_series, dtype: float64
In [20]: days = pd.date_range('2014-12-01', '2014-12-07')
In [21]: s.reindex(days, method='ffill')
Out[21]:
2014-12-01 65205.31
2014-12-02 65205.31
2014-12-03 65205.31
2014-12-04 65205.31
2014-12-05 65205.31
2014-12-06 65205.31
2014-12-07 65205.31
Freq: D, Name: my_series, dtype: float64
在这种情况下(您给出的示例日期),这给alle相同的值,因为对于所有日期,原始系列中的最新观测值都是相同的.
In this case (the example dates you gave), this gives alle the same values, as for all dates the most recent observation in the original series is the same.
如果您不想提供特定的集合,而只是原始系列开始到结束的所有日期,则可以使用resample
达到相同的目的:
If you don't want to give a specific set, but just all dates from the start to end of the original Series, you can use resample
do reach the same:
In [23]: s.resample('D', how='last', fill_method='ffill')
Out[23]:
time
2014-01-27 80000
2014-01-28 80000
2014-01-29 80000
2014-01-30 80000
...
2014-11-10 64405.31
2014-11-11 64405.31
2014-11-12 64405.31
2014-11-13 65205.31
Freq: D, Name: my_series, Length: 291
这篇关于以最新值进行抽样的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!