如何获得数据集的其余部分 [英] How to get the remainder of a dataset
问题描述
下面是一个从数据集中选择一定数量的随机排列的示例,我如何使用remaainder创建一个新的数据集.例如,在下面的示例中,我选择49402(约占10%)并创建一个名为UnseenTestdata
的数据集,我希望其余部分进入一个名为testdata
的新数据集.
Below is an example of choosing a set number of random permutations from a dataset, how can I create a new dataset with the remaainder. For example below I choose 49402 (roughly 10%) and create a dataset named UnseenTestdata
after this is chosen I want the remainder to go into a new dataset called testdata
.
pointsToPick = 49402; %# Numbers to pick
rVec = randperm(494021); %# Random permutation of datapoint indices (N=494021 in this case)
UnseenTestdata = fulldata(rVec(1:pointsToPick),:); %# Random sample
Unseentestdata减去fulldata =适当命名为testdata的数据集的其余部分.
Unseentestdata minus fulldata = remainder of the dataset aptly named testdata.
全数据集的维数为494021x6,我从全数据中随机选择49402x6.然后,我需要从fulldata减去untestedtestdata剩下的东西.
Dimensions of fulldata set is 494021x6 of which I choose at random 49402x6 from fulldata. I then need to get whats left from fulldata minus the unseentestdata.
Barnabas Szabolcs添加了以下测试用例答案:
Barnabas Szabolcs added a test case answer of:
fulldata = [1 2; 3 4; 5 6; 7 8];
rVec = randperm(4);
pointsToPick=2;
unseen = fulldata(rVec(1:pointsToPick),:);
testdata = fulldata(rVec(pointsToPick:length(rVec)),:);
但是这不起作用,我已经在屏幕上转储了结果:
However this does not work, I have screen dumped the results:
如果您在屏幕上注意到转储看不见的数据= 3,4和7,8,但是如果您在testdata中注意到,则保留7,8.
If you notice in the screen dump unseen data = 3,4 and 7,8 however if you notice in testdata 7,8 remain.
如果fulldata =
If fulldata =
1,2
3,4
5,6
7,8
在这种情况下,我们选择两个随机行,其中看不见的行是:
And we choose two random rows in this case the rows in unseen are:
row
3,4
7,8
那么剩下的应该是:
1,2
5,6
但是,如果您在示例测试testdata的sreen转储中注意到以下行:
However if you notice in the sreen dump from the example test testdata has the row:
7,8
显示示例测试无效.
推荐答案
如果我正确理解了您的问题,则解决方法是
If I understand your question correctly, the solution is
testdata = fulldata(rVec((pointsToPick+1):length(rVec)),:);
简单的测试用例:
fulldata = [1 2; 3 4; 5 6; 7 8;10 9];
rVec = randperm(4); // gives me first time [4 2 3 1 5]
pointsToPick=2;
unseen = fulldata(rVec(1:pointsToPick),:); // [7 8; 3 4]
// length(rVec) is 5
testdata = fulldata(rVec((pointsToPick+1):length(rVec)),:); // [5 6; 1 2; 10 9]
从某种意义上您可以清楚地看到fulldata
= unseen
(setplus)testdata
.
注意,我们需要"+1",因为数组是从一个向上索引的,这与c ++中的说法不同,因此最后一个索引是length
而不是length-1
.
you can clearly see that in a sense fulldata
= unseen
(setplus) testdata
.
Note that we need "+1" because arrays are indexed from one up unlike say in c++, so the last index is length
not length-1
.
您可以使用以下方法验证一切是否正确:
You can verify if things are correct using this:
isequal(sort([unseen; test]), sort(full_data)) // should be true
这篇关于如何获得数据集的其余部分的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!