python中的KFold究竟是做什么的? [英] What does KFold in python exactly do?

查看:26
本文介绍了python中的KFold究竟是做什么的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在看这个教程:https://www.dataquest.io/mission/74/getting-started-with-kaggle

我到了第 9 部分,进行预测.在一个名为 Titanic 的数据框中有一些数据,然后使用以下方法将其划分为折叠:

# 为泰坦尼克号数据集生成交叉验证折叠.它返回对应于训练和测试的行索引.# 我们设置 random_state 以确保每次运行时都能获得相同的拆分.kf = KFold(titanic.shape[0], n_folds=3, random_state=1)

我不确定它到底在做什么以及 kf 是什么类型的对象.我尝试阅读文档,但没有太大帮助.另外,有3个折叠(n_folds=3),为什么后来只访问了这一行的train和test(我怎么知道它们被称为train和test)?

对于train,在kf中测试:

解决方案

KFold 将提供训练/测试索引以拆分训练和测试集中的数据.它将数据集拆分为 k 个连续的折叠(默认情况下不混洗).然后每个折叠使用一次验证集,而 k - 1 剩余的折叠形成训练集(来源).

假设你有一些从 1 到 10 的数据索引.如果你使用 n_fold=k,在第一次迭代中你会得到 i'th (i<=k) 折叠作为测试索引,剩余的 (k-1) 折叠(没有那个 i'th 折叠)一起作为训练索引.>

一个例子

将 numpy 导入为 np从 sklearn.cross_validation 导入 KFoldx = [1,2,3,4,5,6,7,8,9,10,11,12]kf = KFold(12, n_folds=3)对于 kf 中的 train_index、test_index:打印(train_index,test_index)

输出

<块引用>

折叠 1:[ 4 5 6 7 8 9 10 11] [0 1 2 3]

折叠 2:[ 0 1 2 3 8 9 10 11] [4 5 6 7]

折叠 3:[0 1 2 3 4 5 6 7] [ 8 9 10 11]

导入 sklearn 0.20 更新:

KFold 对象已移至 0.20 版中的 sklearn.model_selection 模块.要在 sklearn 0.20+ 中导入 KFold,请使用 from sklearn.model_selection import KFold.KFold 当前文档来源

I am looking at this tutorial: https://www.dataquest.io/mission/74/getting-started-with-kaggle

I got to part 9, making predictions. In there there is some data in a dataframe called titanic, which is then divided up in folds using:

# Generate cross validation folds for the titanic dataset.  It return the row indices corresponding to train and test.
# We set random_state to ensure we get the same splits every time we run this.
kf = KFold(titanic.shape[0], n_folds=3, random_state=1)

I am not sure what is it exactly doing and what kind of object kf is. I tried reading the documentation but it did not help much. Also, there are three folds (n_folds=3), why is it later only accessing train and test (and how do I know they are called train and test) in this line?

for train, test in kf:

解决方案

KFold will provide train/test indices to split data in train and test sets. It will split dataset into k consecutive folds (without shuffling by default).Each fold is then used a validation set once while the k - 1 remaining folds form the training set (source).

Let's say, you have some data indices from 1 to 10. If you use n_fold=k, in first iteration you will get i'th (i<=k) fold as test indices and remaining (k-1) folds (without that i'th fold) together as train indices.

An example

import numpy as np
from sklearn.cross_validation import KFold

x = [1,2,3,4,5,6,7,8,9,10,11,12]
kf = KFold(12, n_folds=3)

for train_index, test_index in kf:
    print (train_index, test_index)

Output

Fold 1: [ 4 5 6 7 8 9 10 11] [0 1 2 3]

Fold 2: [ 0 1 2 3 8 9 10 11] [4 5 6 7]

Fold 3: [0 1 2 3 4 5 6 7] [ 8 9 10 11]

Import Update for sklearn 0.20:

KFold object was moved to the sklearn.model_selection module in version 0.20. To import KFold in sklearn 0.20+ use from sklearn.model_selection import KFold. KFold current documentation source

这篇关于python中的KFold究竟是做什么的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆