时间序列数据的分层交叉验证 [英] Stratified Cross validation of timeseries data

查看：179 发布时间：2020/5/24 3:19:48 python pandas scikit-learn time-series cross-validation

本文介绍了时间序列数据的分层交叉验证的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想基于组(grp列)进行时间序列交叉验证.在下面的示例数据中，温度是我的目标变量

I want to do a time series cross validation based on group (grp column). In the below sample data, Temperature is my target variable

import numpy as np
import pandas as pd
timeS=pd.date_range(start='1980-01-01 00:00:00', end='1980-01-01 00:00:05', 
freq='S')
df = pd.DataFrame(dict(time=timeS, grp=['A']*3 + ['B']*3, material=[1,2,3]*2,
temperature=['2.4','5','9.9']*2))


    grp material    temperature    time
0   A   1       2.4                1980-01-01 00:00:00
1   A   2       5                  1980-01-01 00:00:01
2   A   3       9.9                1980-01-01 00:00:02
3   B   1       2.4                1980-01-01 00:00:03
4   B   2       5                  1980-01-01 00:00:04
5   B   3       9.9                1980-01-01 00:00:05

我打算使用此代码基于grp添加一些滞后功能.

i am planing to add some lag features based on grp using this code.

df.groupby("grp")['temperature'].shift(-1)
0      5
1    9.9
2    NaN
3      5
4    9.9
5    NaN
Name: temperature, dtype: object

我现在遇到的问题是，当我进行交叉验证时，可以使用sklearn sklearn.model_selection.TimeSeriesSplit 的此函数，但它没有考虑组效应.谁能告诉我如何按组进行CV拆分(例如分层拆分)?如果有帮助，我将使用xgboost.cv进行简历.

The problem now i have is when i do cross validation I can using this function from sklearn sklearn.model_selection.TimeSeriesSplit but it does not take into consideration of the group effect. Can anyone tell me how to do the CV split per group (like stratified split)? I am going to use xgboost.cv for cv if that helps.

每个组的时间更改.组中的时间均匀地(每秒)增加

Time changes per group. Time increases uniformly (per second) within the group

时间序列数据的分层交叉验证 [英] Stratified Cross validation of timeseries data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

时间序列数据的分层交叉验证 [英] Stratified Cross validation of timeseries data

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭