pandas 重新采样的奇怪行为 [英] Strange behavior of pandas resampling
问题描述
我正在经历一个熊猫时间序列(Python)的重采样功能的相当奇怪的行为.我使用的是最新版本的熊猫(0.12.0)
I'm experiencing a rather strange behavior of the resampling function of a pandas time-series (Python). I use the latest version of pandas (0.12.0)
采用以下时间序列:
dates = [datetime(2011, 1, 2, 1), datetime(2011, 1, 2, 2), datetime(2011, 1, 2, 3),
datetime(2011, 1, 2, 4), datetime(2011, 1, 2, 5), datetime(2011, 1, 2, 6)]
ts = Series(np.arange(6.), index=dates)
然后尝试重采样到66s和65s.这是我得到的结果:
Then try resampling to 66s and to 65s. This is the result I get:
In [45]: ts.resample('66min')
Out[45]:
2011-01-02 01:00:00 0.5
2011-01-02 02:06:00 2.0
2011-01-02 03:12:00 3.0
2011-01-02 04:18:00 4.0
2011-01-02 05:24:00 5.0
Freq: 66T, dtype: float64
In [46]: ts.resample('65min')
Out[46]:
2011-01-02 01:00:00 0
2011-01-02 02:05:00 NaN
2011-01-02 03:10:00 NaN
2011-01-02 04:15:00 NaN
2011-01-02 05:20:00 NaN
2011-01-02 06:25:00 NaN
Freq: 65T, dtype: float64
我确实了解重新采样到66s时的行为.它始终采用相应间隔中所有值的平均值(默认值). 我不了解,也不知道如何在65秒钟内影响行为.
I do understand the behavior when resampling to 66s. It always takes the mean (default) of all the values in the respective interval. I do not understand and don't know how to influence the behavior for 65s.
这是一个简化的问题.背景是一个更复杂的数据校正过程,涉及重新采样.
This is a simplified problem. The background is a more complex data correction process, involving resampling.
有什么想法吗?
推荐答案
也许您想插值而不是重新采样.这是一种方法:
Perhaps you want interpolate instead of resample. Here's one way:
In [53]: index = pd.date_range(freq='66T', start=ts.first_valid_index(), periods=5)
In [54]: ts.reindex(set(ts.index).union(index)).sort_index().interpolate('time').ix[index]
Out[54]:
2011-01-02 01:00:00 0.0
2011-01-02 02:06:00 1.1
2011-01-02 03:12:00 2.2
2011-01-02 04:18:00 3.3
2011-01-02 05:24:00 4.4
Freq: 66T, dtype: float64
In [55]: index = pd.date_range(freq='65T', start=ts.first_valid_index(), periods=5)
In [56]: ts.reindex(set(ts.index).union(index)).sort_index().interpolate('time').ix[index]
Out[56]:
2011-01-02 01:00:00 0.000000
2011-01-02 02:05:00 1.083333
2011-01-02 03:10:00 2.166667
2011-01-02 04:15:00 3.250000
2011-01-02 05:20:00 4.333333
Freq: 65T, dtype: float64
话说回来,似乎可以改善重采样.乍一看,您所展示的行为是神秘的,我同意,这是无益的.值得讨论.
That said, it seems like resample could be improved. At first glance, the behavior you've demonstrated is mysterious and, I agree, unhelpful. Worth discussing.
这篇关于 pandas 重新采样的奇怪行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!