pandas 时间序列重新采样结束给定的一天 [英] pandas timeseries resampling ending a given day

查看:34
本文介绍了 pandas 时间序列重新采样结束给定的一天的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怀疑很多从事时间序列数据工作的人已经遇到过这个问题,而且 pandas 似乎还没有提供一个简单的解决方案(还没有!):

I suspect many people working on timeseries data have already come across this issue, and pandas doesn't seem to provide a straightforward solution (yet!):

假设:

  1. 您有一个包含收盘价的时间序列的每日数据,按日期(天)索引.
  2. 今天是 6 月 19 日.最后收盘数据值为 18JUN.
  3. 您希望将每日数据重新采样到 OHLC 条形图中,并以某个给定的频率(比如 M 或 2M)在 6 月 18 日结束.

所以对于 M 频率,最后一根柱子是 19MAY-18JUN,前一根柱子是 19APR-18MAY,依此类推......

So for M freq, last bar would be 19MAY-18JUN, previous one 19APR-18MAY, and so on...

ts.resample('M', how='ohlc')

将进行重新采样,但 'M' 是 'end_of_month' 期间,因此结果将给出 2014-05 的整月和 2014-06 的 2 周期间,因此您的最后一个条形图不会是 '每月酒吧'.这不是我们想要的!

will do the resampling, but 'M' is 'end_of_month' period so the result will give a full month for 2014-05 and a 2-week period for 2014-06, so your last bar won't be a 'monthly bar'. That's not what we want!

使用 2M 频率,给定我的样本时间序列,我的测试给了我最后一个标记为 2014-07-31(之前标记为 2014-05-31)的条形图,这非常具有误导性,因为不是 7 月的数据.... 假定的最后 2 个月柱再次仅涵盖最近 2 周.

With 2M frequency, given my sample timeseries, my test gives me last bar labelled as 2014-07-31 (and previous labelled as 2014-05-31), which is quite misleading since there's not data on JUL.... The supposed last 2-Month bar is again just covering the most recent 2 weeks.

正确的 DatetimeIndex 很容易创建:

The correct DatetimeIndex is easily created with:

pandas.date_range(end='2014-06-18', freq='2M', periods=300) + datetime.timedelta(days=18)

(Pandas 文档更喜欢通过

(Pandas documentation prefers to do the same thing via

pandas.date_range(end='2014-06-18', freq='2M', periods=300) + pandas.tseries.offsets.DateOffset(days=18)

但我的测试表明,这种方法虽然更 'pandaïc' 慢 2 倍!)

but my tests shows that this method, though more 'pandaïc' is 2x slower!)

无论哪种方式,我们都无法将正确的 DatetimeIndex 应用于 ts.resample().

Either way we can't apply the right DatetimeIndex to ts.resample().

似乎熊猫开发团队(熊猫中的日期范围)是意识到这个问题,但与此同时,您如何解决它以使 OHLC 的滚动频率锚定在时间序列的最后一天?

It seems that pandas dev team (Date ranges in Pandas) is aware of this issue, but in the meantime, how could you solve it to get OHLC with rolling frequency anchored on the last day in the timeseries?

推荐答案

这基本上是从复制/粘贴中破解的,我确信在某些情况下会失败 - 但下面是一些锚定自定义偏移的起始代码到一个月中的某一天.

This is basically hacked together from copy/paste, and I'm sure fails on some cases - but below is some starting code for a custom Offset that is anchored to a particular day in a month.

from pandas.tseries.offsets import (as_datetime, as_timestamp, apply_nat, 
                               DateOffset, relativedelta, datetime)
class MonthAnchor(DateOffset):
    """DateOffset Anchored to day in month

        Arguments:
        day_anchor: day to be anchored to
    """

    def __init__(self, n=1, **kwds):
        super(MonthAnchor, self).__init__(n)

        self.kwds = kwds
        self._dayanchor = self.kwds['day_anchor']

    @apply_nat
    def apply(self, other):
        n = self.n

        if other.day > self._dayanchor and n <= 0:  # then roll forward if n<=0
            n += 1
        elif other.day < self._dayanchor and n > 0:
            n -= 1

        other = as_datetime(other) + relativedelta(months=n)
        other = datetime(other.year, other.month, self._dayanchor)
        return as_timestamp(other)

    def onOffset(self, dt):
        return dt.day == self._dayanchor

    _prefix = ''

示例用法:

In [28]: df = pd.DataFrame(data=np.linspace(50, 100, 200), index=pd.date_range(end='2014-06-18', periods=200), columns=['value'])

In [29]: df.head()
Out[29]: 
                value
2013-12-01  50.000000
2013-12-02  50.251256
2013-12-03  50.502513
2013-12-04  50.753769
2013-12-05  51.005025


In [61]: month_offset = MonthAnchor(day_anchor = df.index[-1].day + 1)

In [62]: df.resample(month_offset, how='ohlc')
Out[62]: 
                value                                   
                 open        high        low       close
2013-11-19  50.000000   54.271357  50.000000   54.271357
2013-12-19  54.522613   62.060302  54.522613   62.060302
2014-01-19  62.311558   69.849246  62.311558   69.849246
2014-02-19  70.100503   76.884422  70.100503   76.884422
2014-03-19  77.135678   84.673367  77.135678   84.673367
2014-04-19  84.924623   92.211055  84.924623   92.211055
2014-05-19  92.462312  100.000000  92.462312  100.000000

这篇关于 pandas 时间序列重新采样结束给定的一天的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆