Matlab分为训练/有效/测试集并保持比例 [英] Matlab split into train/valid/test set and keep proportion

查看:154
本文介绍了Matlab分为训练/有效/测试集并保持比例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有12列+ 1个目标(二进制)和大约4000行的数据集.我需要将其分为训练(70%),验证(20%)和测试(10%)设置.

I have dataset with 12 columns + 1 target (binary) and about 4000 rows. I need to split it into train (70%), validation (20%) and test (10%) set.

数据集的采样率相当低(0类的95%到1类的5%),因此我需要保持每个样本中目标的比率.

The dataset is quite undersampled (95% of class 0 to 5% of class 1) so I need to keep the ratio of target in each sample.

我能够以某种方式拆分数据集,但是我不知道如何保持比率.

I am able to split the dataset somehow, but I have no idea how to keep the ratio.

我正在使用此处

推荐答案

如果您有权使用Matlab的统计处理工具箱,则可以使用 cvpartition 功能.

If you have access to Matlab's Statistical processing toolbox you can used the cvpartition function.

从Matlab帮助中获取 cvpartition -:

From matlab help on cvpartition -:

c = cvpartition(group,'HoldOut',p)使用组中的类信息将观察结果随机分层为训练集和测试集;也就是说,训练和测试集的班级比例与小组中的比例大致相同.

c = cvpartition(group,'HoldOut',p) randomly partitions observations into a training set and a test set with stratification, using the class information in group; that is, both training and test sets have roughly the same class proportions as in group.

您可以两次应用该函数以获得三个分区.此功能保留了原始的类分布.

You can apply the function twice to get three partitions. This function preserves the original class distribution.

这篇关于Matlab分为训练/有效/测试集并保持比例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆