在很长的时间序列中删除假期和周末,如何在 Python 中对时间序列进行建模? [英] Remove Holidays and Weekends in a very long time-serie, how to model time-series in Python?

查看:35
本文介绍了在很长的时间序列中删除假期和周末,如何在 Python 中对时间序列进行建模?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Python 中是否有一些函数可以处理这个问题.GoogleDocs 有一个 Weekday 操作,所以 Python 中可能有类似的东西.我很确定一定有人解决了这个问题,类似的问题发生在稀疏数据中,例如金融和研究.我基本上只是想组织大量按天、时间序列索引的不同大小的向量,我不确定我应该如何度过这些日子——用 1 标记第一天,用 N 或 unix 标记最后一天——时间或应该如何做?我不确定是否应该将时间序列保存到矩阵中,以便我可以更轻松地对它们进行建模以计算相关矩阵之类的东西,有没有准备好做这样的事情?

Is there some function in Python to handle this. GoogleDocs has a Weekday -operation so perhaps there is something like that in Python. I am pretty sure someone must have solved this, similar problems occur in sparse data such as in finance and research. I am basically just trying to organize a huge amount of different sized vectors indexed by days, time-series, I am not sure how I should hadle the days -- mark the first day with 1 and the last day with N or with unix -time or how should that be done? I am not sure whether the time-series should be saved into matrix so I could model them more easily to calculate correlation matrices and such things, any ready thing to do such things?

让我们尝试在没有实际"额外混乱的情况下解决这个问题:

Let's try to solve this problem without the "practical" extra clutter:

import itertools
seq = range(100000)
criteria  = cycle([True]*10 + [False]*801)
list(compress(seq, criteria))

现在必须将它们更改为天,然后将 $\mathbb R$ 更改为 $( \mathbb R, \mathbb R)$, 元组.所以 $V : \mathbb R \mapsto \mathbb R^{2}$ 失踪,调查.

now have to change them into days and then change the $\mathbb R$ into $( \mathbb R, \mathbb R)$, tuple. So $V : \mathbb R \mapsto \mathbb R^{2}$ missing, investigating.

[更新]

来玩吧!下面的代码解决了子问题——创建一些测试数据来测试事物——现在我们需要在那里创建任意天数和估值,以尝试在任意时间序列上测试它.如果我们可以创建一些函数 $V$,我们就非常接近解决这个问题了......它必须考虑假期和周末所以可能并不容易(不确定).

Let's play! Below code solves the subproblem -- creates some test data to test things -- now we need to create arbitrary days and valuations there to try to test it on arbitrary timeseries. If we can create some function $V$, we are very close to solve this problem...it must consider though the holidays and weekends so maybe not easy (not sure).

import itertools as i
import time
import math
import numpy



def createRandomData():
    samples=[]

    for x in range(5):
        seq = range(5)
        criteria  = i.cycle([True]*x+ [False]*3)

        samples += [list(i.compress( seq, criteria ))] 

    return samples

def createNNtriangularMatrix(data):
    N = len(data)
    return [aa+[0]*(N-len(aa)) for aa in data]


A= createNNtriangularMatrix(createRandomData())
print numpy.array(A)
print numpy.corrcoef(A)

推荐答案

尝试使用 pandas.您可以为工作日创建 DateOffset 并将您的数据包含在 DataFrame 中(请参阅:http://pandas.pydata.org/pandas-docs/stable/timeseries.html) 对其进行分析.

Try using pandas. You can create a DateOffset for business days and include your data in a DataFrame (see: http://pandas.pydata.org/pandas-docs/stable/timeseries.html) to analyze it.

这篇关于在很长的时间序列中删除假期和周末,如何在 Python 中对时间序列进行建模?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆