如何在 sklearn 中获得非洗牌的 train_test_split [英] How to get a non-shuffled train_test_split in sklearn
问题描述
如果我想要一个随机的训练/测试分割,我使用 sklearn 辅助函数:
If I want a random train/test split, I use the sklearn helper function:
In [1]: from sklearn.model_selection import train_test_split
...: train_test_split([1,2,3,4,5,6])
...:
Out[1]: [[1, 6, 4, 2], [5, 3]]
获得非混洗训练/测试拆分的最简洁方法是什么,即
What is the most concise way to get a non-shuffled train/test split, i.e.
[[1,2,3,4], [5,6]]
编辑目前我正在使用
train, test = data[:int(len(data) * 0.75)], data[int(len(data) * 0.75):]
但希望有更好的东西.我在 sklearn 上打开了一个问题https://github.com/scikit-learn/scikit-learn/issues/8844
but hoping for something a little nicer. I have opened an issue on sklearn https://github.com/scikit-learn/scikit-learn/issues/8844
EDIT 2:我的 PR 已经合并,在 scikit-learn 0.19 版本中,你可以将参数 shuffle=False
传递给 train_test_split
以获得非混洗的拆分.
EDIT 2: My PR has been merged, in scikit-learn version 0.19, you can pass the parameter shuffle=False
to train_test_split
to obtain a non-shuffled split.
推荐答案
除了一个易于复制的粘贴功能外,我不会对 Psidom 的答案添加太多内容:
I'm not adding much to Psidom's answer except an easy to copy paste function:
def non_shuffling_train_test_split(X, y, test_size=0.2):
i = int((1 - test_size) * X.shape[0]) + 1
X_train, X_test = np.split(X, [i])
y_train, y_test = np.split(y, [i])
return X_train, X_test, y_train, y_test
更新:在某些时候,此功能成为内置功能,因此现在您可以执行以下操作:
Update: At some point this feature became built in, so now you can do:
from sklearn.model_selection import train_test_split
train_test_split(X, y, test_size=0.2, shuffle=False)
这篇关于如何在 sklearn 中获得非洗牌的 train_test_split的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!