numpy的:如何分割/分区数据集(数组)到训练和测试数据集进行,例如,交叉验证? [英] Numpy: How to split/partition a dataset (array) into training and test datasets for, e.g., cross validation?

查看:3020
本文介绍了numpy的:如何分割/分区数据集(数组)到训练和测试数据集进行,例如,交叉验证?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

什么是随机拆分numpy的阵列到训练和测试/验证数据的好方法?类似的事情在Matlab的cvpartition或crossvalind功能。

What is a good way to split a numpy array randomly into training and testing / validation dataset? Something similar to the cvpartition or crossvalind functions in Matlab.

推荐答案

如果您希望将数据拆分成两半设置一次,就可以使用 numpy.random.shuffle numpy.random.permutation 如果你需要跟踪指数:

If you want to divide the data set once in two halves, you can use numpy.random.shuffle, or numpy.random.permutation if you need to keep track of the indices:

import numpy
# x is your dataset
x = numpy.random.rand(100, 5)
numpy.random.shuffle(x)
training, test = x[:80,:], x[80:,:]

import numpy
# x is your dataset
x = numpy.random.rand(100, 5)
indices = numpy.random.permutation(x.shape[0])
training_idx, test_idx = indices[:80], indices[80:]
training, test = x[training_idx,:], x[test_idx,:]

有很多方法可以反复分区相同的数据,交叉验证设置。一个策略是从数据重新取样,用重复:

There are many ways to repeatedly partition the same data set for cross validation. One strategy is to resample from the dataset, with repetition:

import numpy
# x is your dataset
x = numpy.random.rand(100, 5)
training_idx = numpy.random.randint(x.shape[0], size=80)
test_idx = numpy.random.randint(x.shape[0], size=20)
training, test = x[training_idx,:], x[test_idx,:]

最后, scikits.learn 包含若干交叉验证方法(K倍,留下正出,分层-K倍,...)。对于文档,你可能需要看一下实例或最新的git仓库,但code看上去结实。

Finally, scikits.learn contains several cross validation methods (k-fold, leave-n-out, stratified-k-fold, ...). For the docs you might need to look at the examples or the latest git repository, but the code looks solid.

这篇关于numpy的:如何分割/分区数据集(数组)到训练和测试数据集进行,例如,交叉验证?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆