如何在 scikit-learn 中预测时间序列? [英] How to predict time series in scikit-learn?

查看:72
本文介绍了如何在 scikit-learn 中预测时间序列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Scikit-learn 使用了一种基于 fitpredict 方法的非常方便的方法.我有适合 fitpredict 格式的时间序列数据.

Scikit-learn utilizes a very convenient approach based on fit and predict methods. I have time-series data in the format suited for fit and predict.

例如我有以下 Xs:

[[1.0, 2.3, 4.5], [6.7, 2.7, 1.2], ..., [3.2, 4.7, 1.1]]

和对应的ys:

[[1.0], [2.3], ..., [7.7]]

这些数据的含义如下.ys 中存储的值形成一个时间序列.Xs 中的值是对应的与时间相关的因素",已知它们对 ys 中的值有一定影响(例如:温度、湿度和大气压力).

These data have the following meaning. The values stored in ys form a time series. The values in Xs are corresponding time dependent "factors" that are known to have some influence on the values in ys (for example: temperature, humidity and atmospheric pressure).

现在,当然,我可以使用 fit(Xs,ys).但是后来我得到了一个模型,其中 ys 中的未来值仅取决于因素而不依赖于之前的 Y 值(至少直接),这是该模型.我想要一个模型,其中 Y_n 也依赖于 Y_{n-1}Y_{n-2} 等等.例如,我可能想使用指数移动平均线作为模型.在 scikit-learn 中最优雅的方法是什么

Now, of course, I can use fit(Xs,ys). But then I get a model in which future values in ys depend only on factors and do not dependend on the previous Y values (at least directly) and this is a limitation of the model. I would like to have a model in which Y_n depends also on Y_{n-1} and Y_{n-2} and so on. For example I might want to use an exponential moving average as a model. What is the most elegant way to do it in scikit-learn

添加

正如评论中提到的,我可以通过添加 ys 来扩展 Xs.但是这种方式有一些局限性.例如,如果我将 y 的最后 5 个值作为 5 个新列添加到 X 中,关于 ys 的时间顺序的信息将丢失.例如,X 中没有指示第 5 列中的值在第 4 列中的值之后,依此类推.作为模型,我可能想要对最后五个 ys 进行线性拟合,并使用找到的线性函数进行预测.但是如果我在 5 列中有 5 个值,那就不是那么简单了.

As it has been mentioned in the comments, I can extend Xs by adding ys. But this way has some limitations. For example, if I add the last 5 values of y as 5 new columns to X, the information about time ordering of ys is lost. For example, there is no indication in X that values in the 5th column follows value in the 4th column and so on. As a model, I might want to have a linear fit of the last five ys and use the found linear function to make a prediction. But if I have 5 values in 5 columns it is not so trivial.

添加了 2 个

为了更清楚我的问题,我想举一个具体的例子.我想要一个线性"模型,其中 y_n = c + k1*x1 + k2*x2 + k3*x3 + k4*EMOV_n,其中 EMOV_n 只是指数移动平均线.我怎样才能在 scikit-learn 中实现这个简单的模型?

To make my problem even more clear, I would like to give one concrete example. I would like to have a "linear" model in which y_n = c + k1*x1 + k2*x2 + k3*x3 + k4*EMOV_n, where EMOV_n is just an exponential moving average. How, can I implement this simple model in scikit-learn?

推荐答案

可能就是您正在寻找的关于指数加权移动平均线的内容:

This might be what you're looking for, with regard to the exponentially weighted moving average:

import pandas, numpy
ewma = pandas.stats.moments.ewma
EMOV_n = ewma( ys, com=2 )

这里,com 是一个参数,您可以阅读有关 这里.然后你可以组合 EMOV_nXs,使用类似的东西:

Here, com is a parameter that you can read about here. Then you can combine EMOV_n to Xs, using something like:

Xs = numpy.vstack((Xs,EMOV_n))

然后您可以查看各种线性模型,此处,以及做类似的事情:

And then you can look at various linear models, here, and do something like:

from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit ( Xs, ys )
print clf.coef_

祝你好运!

这篇关于如何在 scikit-learn 中预测时间序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆