如何在 sklearn 中获得非洗牌的 train_test_split [英] How to get a non-shuffled train_test_split in sklearn

查看:60
本文介绍了如何在 sklearn 中获得非洗牌的 train_test_split的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我想要一个随机的训练/测试分割,我使用 sklearn 辅助函数:

If I want a random train/test split, I use the sklearn helper function:

In [1]: from sklearn.model_selection import train_test_split
   ...: train_test_split([1,2,3,4,5,6])
   ...:
Out[1]: [[1, 6, 4, 2], [5, 3]]

获得非混洗训练/测试拆分的最简洁方法是什么,即

What is the most concise way to get a non-shuffled train/test split, i.e.

[[1,2,3,4], [5,6]]

编辑目前我正在使用

train, test = data[:int(len(data) * 0.75)], data[int(len(data) * 0.75):] 

但希望有更好的东西.我在 sklearn 上打开了一个问题https://github.com/scikit-learn/scikit-learn/issues/8844

but hoping for something a little nicer. I have opened an issue on sklearn https://github.com/scikit-learn/scikit-learn/issues/8844

EDIT 2:我的 PR 已经合并,在 scikit-learn 0.19 版本中,你可以将参数 shuffle=False 传递给 train_test_split以获得非混洗的拆分.

EDIT 2: My PR has been merged, in scikit-learn version 0.19, you can pass the parameter shuffle=False to train_test_split to obtain a non-shuffled split.

推荐答案

除了一个易于复制的粘贴功能外,我不会对 Psidom 的答案添加太多内容:

I'm not adding much to Psidom's answer except an easy to copy paste function:

def non_shuffling_train_test_split(X, y, test_size=0.2):
    i = int((1 - test_size) * X.shape[0]) + 1
    X_train, X_test = np.split(X, [i])
    y_train, y_test = np.split(y, [i])
    return X_train, X_test, y_train, y_test

更新:在某些时候,此功能成为内置功能,因此现在您可以执行以下操作:

Update: At some point this feature became built in, so now you can do:

from sklearn.model_selection import train_test_split
train_test_split(X, y, test_size=0.2, shuffle=False)

这篇关于如何在 sklearn 中获得非洗牌的 train_test_split的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆