如何在scikit-learn中预测时间序列? [英] How to predict time series in scikit-learn?
问题描述
Scikit-learn利用基于fit
和predict
方法的非常方便的方法.我有适合fit
和predict
的格式的时间序列数据.
Scikit-learn utilizes a very convenient approach based on fit
and predict
methods. I have time-series data in the format suited for fit
and predict
.
例如,我有以下Xs
:
[[1.0, 2.3, 4.5], [6.7, 2.7, 1.2], ..., [3.2, 4.7, 1.1]]
和相应的ys
:
[[1.0], [2.3], ..., [7.7]]
这些数据具有以下含义. ys
中存储的值形成一个时间序列. Xs
中的值是对应于时间的因素",已知它们对ys
中的值有一些影响(例如:温度,湿度和大气压).
These data have the following meaning. The values stored in ys
form a time series. The values in Xs
are corresponding time dependent "factors" that are known to have some influence on the values in ys
(for example: temperature, humidity and atmospheric pressure).
现在,我当然可以使用fit(Xs,ys)
.但是随后我得到一个模型,其中ys
中的将来值仅取决于因素,而不取决于(至少直接地)先前的Y
值,这是该模型的局限性.我希望有一个模型,其中Y_n
也依赖于Y_{n-1}
和Y_{n-2}
等.例如,我可能要使用指数移动平均线作为模型. scikit-learn中最优雅的方法是什么
Now, of course, I can use fit(Xs,ys)
. But then I get a model in which future values in ys
depend only on factors and do not dependend on the previous Y
values (at least directly) and this is a limitation of the model. I would like to have a model in which Y_n
depends also on Y_{n-1}
and Y_{n-2}
and so on. For example I might want to use an exponential moving average as a model. What is the most elegant way to do it in scikit-learn
添加
正如评论中提到的,我可以通过添加ys
来扩展Xs
.但是这种方式有一些局限性.例如,如果我将y
的最后5个值作为5个新列添加到X
,则会丢失有关ys
的时间顺序的信息.例如,在X
中没有指示第5列中的值跟随第4列中的值,依此类推.作为模型,我可能希望对最后五个ys
进行线性拟合,并使用找到的线性函数进行预测.但是,如果我在5列中有5个值,那就不是那么简单了.
As it has been mentioned in the comments, I can extend Xs
by adding ys
. But this way has some limitations. For example, if I add the last 5 values of y
as 5 new columns to X
, the information about time ordering of ys
is lost. For example, there is no indication in X
that values in the 5th column follows value in the 4th column and so on. As a model, I might want to have a linear fit of the last five ys
and use the found linear function to make a prediction. But if I have 5 values in 5 columns it is not so trivial.
添加2
为了使我的问题更加清楚,我想举一个具体的例子.我想要一个线性"模型,其中y_n = c + k1*x1 + k2*x2 + k3*x3 + k4*EMOV_n
,其中EMOV_n只是指数移动平均值.如何在scikit-learn中实现这个简单的模型?
To make my problem even more clear, I would like to give one concrete example. I would like to have a "linear" model in which y_n = c + k1*x1 + k2*x2 + k3*x3 + k4*EMOV_n
, where EMOV_n is just an exponential moving average. How, can I implement this simple model in scikit-learn?
推荐答案
关于指数加权移动平均值,这可能是您正在寻找的 :
This might be what you're looking for, with regard to the exponentially weighted moving average:
import pandas, numpy
ewma = pandas.stats.moments.ewma
EMOV_n = ewma( ys, com=2 )
在这里,com
是您可以阅读的有关此处.然后,您可以使用以下方法将EMOV_n
组合为Xs
:
Here, com
is a parameter that you can read about here. Then you can combine EMOV_n
to Xs
, using something like:
Xs = numpy.vstack((Xs,EMOV_n))
然后,您可以在此处中查看各种线性模型,以及做类似的事情:
And then you can look at various linear models, here, and do something like:
from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit ( Xs, ys )
print clf.coef_
祝你好运!
这篇关于如何在scikit-learn中预测时间序列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!