在 matlab/octave 中将数据集拆分为两个子集 [英] Split the dataset into two subsets in matlab/octave

查看:143
本文介绍了在 matlab/octave 中将数据集拆分为两个子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

将数据集拆分为两个子集,例如训练"和测试",其中训练集包含 80% 的数据,测试集包含剩余的 20%.

Split the dataset into two subsets, say, "train" and "test", with the train set containing 80% of the data and the test set containing the remaining 20%.

拆分意味着生成长度等于的逻辑索引数据集中的观察数量,1 次用于训练sample 和 0 为 at 测试样本.

Splitting means to generate a logical index of length equal to the number of observations in the dataset, with 1 for a training sample and 0 for at test sample.

N=length(data.x)

N=length(data.x)

输出:称为 idxTrain 和 idxTest 的逻辑数组.

Output: logical arrays called idxTrain and idxTest.

推荐答案

这应该可以解决问题:

% Generate sample data...
data = rand(32000,1);

% Calculate the number of training entries...
train_off = round(numel(data) * 0.8);

% Split data into training and test vectors...
train = data(1:train_off);
test = data(train_off+1:end);

但如果你真的想依赖逻辑索引,你可以按照以下步骤进行:

But if you really want to rely on logical indexing, you can proceed as follows:

% Generate sample data...
data = rand(32000,1);
data_len = numel(data);

% Calculate the number of training entries...
train_count = round(data_len * 0.8);

% Create the logical indexing...
is_training = [true(train_count,1); false(data_len-train_count,1)];

% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);

您还可以使用 randsample 函数 以实现一些随机性您的提取,但这不会在每次运行脚本时为您提供测试和训练元素的确切抽取次数:

You can also go for the randsample function in order to achieve some randomness in your extractions, but this won't grant you an exact number of draws for test and training elements every time you run the script:

% Generate sample data...
data = rand(32000,1);

% Generate a random true/false indexing with unequally weighted probabilities...
is_training = logical(randsample([0 1],32000,true,[0.2 0.8]));

% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);

您可以通过生成正确数量的测试和训练索引,然后使用 randperm 基于索引:

You may avoid this problem by producing a correct number of test and training indices and then shuffling them using a randperm based indexing:

% Generate sample data...
data = rand(32000,1);
data_len = numel(data);

% Calculate the number of training entries...
train_count = round(data_len * 0.8);

% Create the logical indexing...
is_training = [true(train_count,1); false(data_len-train_count,1)];

% Shuffle the logical indexing...
is_training = is_training(randperm(32000));

% Split data into training and test vectors...
train = data(is_training);
test = data(~is_training);

这篇关于在 matlab/octave 中将数据集拆分为两个子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆