python中的KFold到底做什么? [英] What does KFold in python exactly do?
问题描述
我正在看本教程: https://www.dataquest .io / mission / 74 / getting-started-with-kaggle
我进入第9部分,进行了预测。在其中,有一个数据数据称为泰坦尼克号,然后使用以下方法将其分成折叠:
I got to part 9, making predictions. In there there is some data in a dataframe called titanic, which is then divided up in folds using:
# Generate cross validation folds for the titanic dataset. It return the row indices corresponding to train and test.
# We set random_state to ensure we get the same splits every time we run this.
kf = KFold(titanic.shape[0], n_folds=3, random_state=1)
<我不确定它到底在做什么,什么是kf对象。我尝试阅读文档,但并没有太大帮助。另外,有三折(n_folds = 3),为什么以后只在该行中访问火车和测试(我怎么知道它们被称为火车和测试)?
I am not sure what is it exactly doing and what kind of object kf is. I tried reading the documentation but it did not help much. Also, there are three folds (n_folds=3), why is it later only accessing train and test (and how do I know they are called train and test) in this line?
for train, test in kf:
推荐答案
KFold将提供训练/测试索引,以在训练和测试集中拆分数据。它将数据集分成连续的 k
折叠(默认不改组),然后每个折叠使用一次验证集,而 k-1
剩下的褶皱形成训练集(源 a>)。
KFold will provide train/test indices to split data in train and test sets. It will split dataset into k
consecutive folds (without shuffling by default).Each fold is then used a validation set once while the k - 1
remaining folds form the training set (source).
比方说,您的数据索引为1到10。如果使用 n_fold = k
,在第一次迭代中,您将获得 i
个第(i <= k)
倍作为测试索引,其余的(k-1)
折叠(没有 i
倍)作为火车索引。
Let's say, you have some data indices from 1 to 10. If you use n_fold=k
, in first iteration you will get i
'th (i<=k)
fold as test indices and remaining (k-1)
folds (without that i
'th fold) together as train indices.
示例
import numpy as np
from sklearn.cross_validation import KFold
x = [1,2,3,4,5,6,7,8,9,10,11,12]
kf = KFold(12, n_folds=3)
for train_index, test_index in kf:
print (train_index, test_index)
输出
折叠1:[4 5 6 7 8 9 10 11] [0 1 2 3]
Fold 1: [ 4 5 6 7 8 9 10 11] [0 1 2 3]
折叠2:[0 1 2 3 8 9 10 11] [4 5 6 7]
Fold 2: [ 0 1 2 3 8 9 10 11] [4 5 6 7]
第三折:[0 1 2 3 4 5 6 7] [8 9 10 11]
Fold 3: [0 1 2 3 4 5 6 7] [ 8 9 10 11]
sklearn 0.20的导入更新:
KFold对象已移至 sklearn .model_selection
模块的版本为0.20。要在sklearn 0.20+中导入KFold,请使用sklearn.model_selection中的导入KFold
。 KFold当前文档源
KFold object was moved to the sklearn.model_selection
module in version 0.20. To import KFold in sklearn 0.20+ use from sklearn.model_selection import KFold
. KFold current documentation source
这篇关于python中的KFold到底做什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!