顺序特征选择Matlab [英] Sequential feature selection Matlab

查看:397
本文介绍了顺序特征选择Matlab的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以解释如何在Matlab中使用此功能 "sequentialfs"

Can somebody explain how to use this function in Matlab "sequentialfs"

它看起来直截了当,但是我不知道我们如何为它设计一个函数处理程序?!

it looks straight forward but I do not know how can we design a function handler for it?!

有任何线索吗?!

推荐答案

这是比文档中的示例更简单的示例.

Here's a simpler example than the one in the documentation.

首先让我们创建一个非常简单的数据集.我们有一些类标签y. 500来自0类,而500来自1类,它们是随机排序的.

First let's create a very simple dataset. We have some class labels y. 500 are from class 0, and 500 are from class 1, and they are randomly ordered.

>> y = [zeros(500,1); ones(500,1)];
>> y = y(randperm(1000));

我们有100个要用于预测y的变量x.其中99个只是随机噪声,但其中一个与类标签高度相关.

And we have 100 variables x that we want to use to predict y. 99 of them are just random noise, but one of them is highly correlated with the class label.

>> x = rand(1000,99);
>> x(:,100) = y + rand(1000,1)*0.1;

现在,我们要使用线性判别分析对点进行分类.如果我们直接执行此操作而不应用任何功能选择,则首先将数据分成训练集和测试集:

Now let's say we want to classify the points using linear discriminant analysis. If we were to do this directly without applying any feature selection, we would first split the data up into a training set and a test set:

>> xtrain = x(1:700, :); xtest = x(701:end, :);
>> ytrain = y(1:700); ytest = y(701:end);

然后我们将它们分类:

>> ypred = classify(xtest, xtrain, ytrain);

最后,我们将测量预测的错误率:

And finally we would measure the error rate of the prediction:

>> sum(ytest ~= ypred)
ans =
     0

在这种情况下,我们得到了完美的分类.

and in this case we get perfect classification.

要使函数句柄与sequentialfs一起使用,只需将这些部分放在一起:

To make a function handle to be used with sequentialfs, just put these pieces together:

>> f = @(xtrain, ytrain, xtest, ytest) sum(ytest ~= classify(xtest, xtrain, ytrain));

并将它们全部一起传递到sequentialfs:

And pass all of them together into sequentialfs:

>> fs = sequentialfs(f,x,y)
fs =
  Columns 1 through 16
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 17 through 32
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 33 through 48
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 49 through 64
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 65 through 80
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 81 through 96
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 97 through 100
     0     0     0     1

输出中的最后一个1表示变量100是x变量中y的最佳预测变量.

The final 1 in the output indicates that variable 100 is, as expected, the best predictor of y among the variables in x.

sequentialfs文档中的示例稍微复杂一些,主要是因为预测的类标签是字符串而不是上面的数值,因此~strcmp用于计算错误率,而不是~=.另外,它利用交叉验证来估计错误率,而不是像上面那样直接评估.

The example in the documentation for sequentialfs is a little more complex, mostly because the predicted class labels are strings rather than numerical values as above, so ~strcmp is used to calculate the error rate rather than ~=. In addition it makes use of cross-validation to estimate the error rate, rather than direct evaluation as above.

这篇关于顺序特征选择Matlab的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆