如何获得数据集的其余部分 [英] How to get the remainder of a dataset

查看:149
本文介绍了如何获得数据集的其余部分的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

下面是一个从数据集中选择一定数量的随机排列的示例,我如何使用remaainder创建一个新的数据集.例如,在下面的示例中,我选择49402(约占10%)并创建一个名为UnseenTestdata的数据集,我希望其余部分进入一个名为testdata的新数据集.

Below is an example of choosing a set number of random permutations from a dataset, how can I create a new dataset with the remaainder. For example below I choose 49402 (roughly 10%) and create a dataset named UnseenTestdata after this is chosen I want the remainder to go into a new dataset called testdata.

pointsToPick = 49402;  %# Numbers to pick
rVec = randperm(494021);   %# Random permutation of datapoint indices (N=494021 in this case)  

UnseenTestdata = fulldata(rVec(1:pointsToPick),:); %# Random sample

Unseentestdata减去fulldata =适当命名为testdata的数据集的其余部分.

Unseentestdata minus fulldata = remainder of the dataset aptly named testdata.

全数据集的维数为494021x6,我从全数据中随机选择49402x6.然后,我需要从fulldata减去untestedtestdata剩下的东西.

Dimensions of fulldata set is 494021x6 of which I choose at random 49402x6 from fulldata. I then need to get whats left from fulldata minus the unseentestdata.

Barnabas Szabolcs添加了以下测试用例答案:

Barnabas Szabolcs added a test case answer of:

fulldata = [1 2; 3 4; 5 6; 7 8];
rVec = randperm(4);  
pointsToPick=2;
unseen = fulldata(rVec(1:pointsToPick),:); 
testdata = fulldata(rVec(pointsToPick:length(rVec)),:); 

但是这不起作用,我已经在屏幕上转储了结果:

However this does not work, I have screen dumped the results:

如果您在屏幕上注意到转储看不见的数据= 3,4和7,8,但是如果您在testdata中注意到,则保留7,8.

If you notice in the screen dump unseen data = 3,4 and 7,8 however if you notice in testdata 7,8 remain.

如果fulldata =

If fulldata =

1,2
3,4
5,6
7,8

在这种情况下,我们选择两个随机行,其中看不见的行是:

And we choose two random rows in this case the rows in unseen are:

row
3,4
7,8

那么剩下的应该是:

1,2
5,6

但是,如果您在示例测试testdata的sreen转储中注意到以下行:

However if you notice in the sreen dump from the example test testdata has the row:

7,8

显示示例测试无效.

推荐答案

如果我正确理解了您的问题,则解决方法是

If I understand your question correctly, the solution is

testdata = fulldata(rVec((pointsToPick+1):length(rVec)),:);

简单的测试用例:

fulldata = [1 2; 3 4; 5 6; 7 8;10 9];
rVec = randperm(4);  // gives me first time [4 2 3 1 5]
pointsToPick=2;
unseen = fulldata(rVec(1:pointsToPick),:); // [7 8; 3 4]
// length(rVec) is 5
testdata = fulldata(rVec((pointsToPick+1):length(rVec)),:); // [5 6; 1 2; 10 9]

从某种意义上您可以清楚地看到fulldata = unseen(setplus)testdata. 注意,我们需要"+1",因为数组是从一个向上索引的,这与c ++中的说法不同,因此最后一个索引是length而不是length-1.

you can clearly see that in a sense fulldata = unseen (setplus) testdata. Note that we need "+1" because arrays are indexed from one up unlike say in c++, so the last index is length not length-1.

您可以使用以下方法验证一切是否正确:

You can verify if things are correct using this:

 isequal(sort([unseen; test]), sort(full_data)) // should be true

这篇关于如何获得数据集的其余部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆