numpy的：如何分割/分区数据集（数组）到训练和测试数据集进行，例如，交叉验证？ [英] Numpy: How to split/partition a dataset (array) into training and test datasets for, e.g., cross validation?

查看：3020 发布时间：2016/6/1 19:49:32 python arrays optimization numpy

本文介绍了numpy的：如何分割/分区数据集（数组）到训练和测试数据集进行，例如，交叉验证？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

什么是随机拆分numpy的阵列到训练和测试/验证数据的好方法？类似的事情在Matlab的cvpartition或crossvalind功能。

What is a good way to split a numpy array randomly into training and testing / validation dataset? Something similar to the cvpartition or crossvalind functions in Matlab.

推荐答案

如果您希望将数据拆分成两半设置一次，就可以使用 numpy.random.shuffle 或 numpy.random.permutation 如果你需要跟踪指数：

If you want to divide the data set once in two halves, you can use numpy.random.shuffle, or numpy.random.permutation if you need to keep track of the indices:

import numpy
# x is your dataset
x = numpy.random.rand(100, 5)
numpy.random.shuffle(x)
training, test = x[:80,:], x[80:,:]

或

import numpy
# x is your dataset
x = numpy.random.rand(100, 5)
indices = numpy.random.permutation(x.shape[0])
training_idx, test_idx = indices[:80], indices[80:]
training, test = x[training_idx,:], x[test_idx,:]

有很多方法可以反复分区相同的数据，交叉验证设置。一个策略是从数据重新取样，用重复：

There are many ways to repeatedly partition the same data set for cross validation. One strategy is to resample from the dataset, with repetition:

import numpy
# x is your dataset
x = numpy.random.rand(100, 5)
training_idx = numpy.random.randint(x.shape[0], size=80)
test_idx = numpy.random.randint(x.shape[0], size=20)
training, test = x[training_idx,:], x[test_idx,:]

最后， scikits.learn 包含若干交叉验证方法（K倍，留下正出，分层-K倍，...）。对于文档，你可能需要看一下实例或最新的git仓库，但code看上去结实。

Finally, scikits.learn contains several cross validation methods (k-fold, leave-n-out, stratified-k-fold, ...). For the docs you might need to look at the examples or the latest git repository, but the code looks solid.

这篇关于numpy的：如何分割/分区数据集（数组）到训练和测试数据集进行，例如，交叉验证？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

numpy的：如何分割/分区数据集（数组）到训练和测试数据集进行，例如，交叉验证？ [英] Numpy: How to split/partition a dataset (array) into training and test datasets for, e.g., cross validation?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

numpy的：如何分割/分区数据集（数组）到训练和测试数据集进行，例如，交叉验证？ [英] Numpy: How to split/partition a dataset (array) into training and test datasets for, e.g., cross validation?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭