时间序列-相关性和滞后时间 [英] Time series - correlation and lag time

查看:1550
本文介绍了时间序列-相关性和滞后时间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究一组输入变量和一个响应变量价格之间的相关性。这些都是按时间顺序排列的。



1)是否有必要在输入变量为周期性(自回归)的情况下平滑曲线?如果是这样,怎么办?



2)建立关联后,我想准确地量化输入变量如何影响响​​应变量。
例如:一旦X增加> 10%,然后6个月后y就会增加2%。



我应该考虑使用哪个python库来实现这一目的-特别是弄清楚两个相关事件之间的滞后时间



示例:



我已经看过: statsmodels.tsa.ARMA ,但它似乎只能预测一段时间内的一个变量。在 scipy 中,协方差矩阵可以告诉我有关相关性,但无助于找出滞后时间。

解决方案

问题的一部分是基于统计的,但在Python中如何使用它似乎有点家。我发现您已经决定通过在Cross Validated上查看您的问题来决定在R中执行此操作,但是如果您决定转回Python,或者为了寻找这个问题的任何其他人而受益:



我认为您在查看statsmodels.tsa的正确位置,但不仅仅是ARMA软件包,还有很多其他优点:



http://statsmodels.sourceforge.net/devel/tsa.html



尤其是,请看一下statsmodels.tsa.vector_ar来建模多元时间序列。此处提供了相关文档:



http://statsmodels.sourceforge.net/devel/vector_ar.html






上面的页面指定它适用于固定时间序列-我认为这意味着同时消除趋势和任何季节性或周期性。以下链接最终为预测模型做好了准备,但它讨论了Box-Jenkins建立模型的方法,包括使模型保持不变:



http://www.colorado.edu/geography/class_homepages/geog_4023_s11/Lecture16_TS3.pdf



您会注意到,该链接讨论了寻找自相关(ACF)和部分自相关(PACF),然后使用Augmented Dickey-Fuller测试来测试该系列现在是否平稳。可以在statsmodels.tsa.stattools中找到这三个工具。同样,statsmodels.tsa.arma_process也具有ACF和PACF。



上面的链接还讨论了如何使用AIC等指标来确定最佳模型; statsmodels.tsa.var_model和statsmodels.tsa.ar_model都包含AIC(以及其他措施)。似乎使用select_order在var_model中计算滞后阶数的方法也相同。






此外,pandas库至少部分集成到statsmodels中,并具有大量的时间序列和数据分析功能本身,所以可能会很有趣。时间序列文档位于此处:



http://pandas.pydata.org/pandas-docs/stable/timeseries.html


I am studying the correlation between a set of input variables and a response variable, price. These are all in time series.

1) Is it necessary that I smooth out the curve where the input variable is cyclical (autoregressive)? If so, how?

2) Once a correlation is established, I would like to quantify exactly how the input variable affects the response variable. Eg: "Once X increases >10% then there is an 2% increase in y 6 months later."

Which python libraries should I be looking at to implement this - in particular to figure out the lag time between two correlated occurrences?

Example:

I already looked at: statsmodels.tsa.ARMA but it seems to deal with predicting only one variable over time. In scipy the covariance matrix can tell me about the correlation, but does not help with figuring out the lag time.

解决方案

While part of the question is more statistics based, the bit about how to do it in Python seems at home here. I see that you've since decided to do this in R from looking at your question on Cross Validated, but in case you decide to move back to Python, or for the benefit of anyone else finding this question:

I think you were in the right area looking at statsmodels.tsa, but there's a lot more to it than just the ARMA package:

http://statsmodels.sourceforge.net/devel/tsa.html

In particular, have a look at statsmodels.tsa.vector_ar for modelling multivariate time series. The documentation for it is available here:

http://statsmodels.sourceforge.net/devel/vector_ar.html


The page above specifies that it's for working with stationary time series - I presume this means removing both trend and any seasonality or periodicity. The following link is ultimately readying a model for forecasting, but it discusses the Box-Jenkins approach for building a model, including making it stationary:

http://www.colorado.edu/geography/class_homepages/geog_4023_s11/Lecture16_TS3.pdf

You'll notice that link discusses looking for autocorrelations (ACF) and partial autocorrelations (PACF), and then using the Augmented Dickey-Fuller test to test whether the series is now stationary. Tools for all three can be found in statsmodels.tsa.stattools. Likewise, statsmodels.tsa.arma_process has ACF and PACF.

The above link also discusses using metrics like AIC to determine the best model; both statsmodels.tsa.var_model and statsmodels.tsa.ar_model include AIC (amongst other measures). The same measures seem to be used for calculating lag order in var_model, using select_order.


In addition, the pandas library is at least partially integrated into statsmodels and has a lot of time series and data analysis functionality itself, so will probably be of interest. The time series documentation is located here:

http://pandas.pydata.org/pandas-docs/stable/timeseries.html

这篇关于时间序列-相关性和滞后时间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆