如何在sklearn中实施前行测试? [英] how to implement walk forward testing in sklearn?

查看:110
本文介绍了如何在sklearn中实施前行测试?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在sklearn中,GridSearchCV可以将管道作为参数,以通过交叉验证找到最佳估计量。但是,通常的交叉验证是这样的:

In sklearn, GridSearchCV can take a pipeline as a parameter to find the best estimator through cross validation. However, the usual cross validation is like this:

要交叉验证时间序列数据,训练和测试数据通常像这:

to cross validate a time series data, the training and testing data are often splitted like this:

也就是说,测试数据应始终在训练数据之前。

That is to say, the testing data should be always ahead of training data.

我的想法是:


  1. 编写我自己的k-fold版本类并将其传递给GridSearchCV,这样我就可以享受管道带来的便利。问题在于,让GridSearchCV使用指定的训练和测试数据索引似乎很困难。

  1. Write my own version class of k-fold and passing it to GridSearchCV so I can enjoy the convenience of pipeline. The problem is that it seems difficult to let GridSearchCV to use an specified indices of training and testing data.

编写一个新类GridSearchWalkForwardTest,它是类似于GridSearchCV,我正在研究源代码grid_search.py​​,发现它有点复杂。

Write a new class GridSearchWalkForwardTest which is similar to GridSearchCV, I am studying the source code grid_search.py and find it is a little complicated.

任何建议欢迎。

推荐答案

我认为您可以使用时间序列拆分而不是您自己的实现,或者作为实现CV方法的基础,

I think you could use a Time Series Split either instead of your own implementation or as a basis for implementing a CV method which is exactly as you describe it.

经过一番挖掘,似乎有人在此PR 似乎可以满足您的要求。

After digging around a bit, it seems like someone added a max_train_size to the TimeSeriesSplit in this PR which seems like it does what you want.

这篇关于如何在sklearn中实施前行测试?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆