序列特征选择Matlab [英] Sequential feature selection Matlab

查看:27
本文介绍了序列特征选择Matlab的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

谁能解释一下如何在Matlab中使用这个函数"sequentialfs"

Can somebody explain how to use this function in Matlab "sequentialfs"

它看起来很直接,但我不知道我们如何为它设计一个函数处理程序?!

it looks straight forward but I do not know how can we design a function handler for it?!

有什么线索吗?!

推荐答案

这是一个比文档中的更简单的示例.

Here's a simpler example than the one in the documentation.

首先让我们创建一个非常简单的数据集.我们有一些类标签y.500个来自0类,500个来自1类,随机排序.

First let's create a very simple dataset. We have some class labels y. 500 are from class 0, and 500 are from class 1, and they are randomly ordered.

>> y = [zeros(500,1); ones(500,1)];
>> y = y(randperm(1000));

我们有 100 个变量 x 用于预测 y.其中99个只是随机噪声,但其中一个与类别标签高度相关.

And we have 100 variables x that we want to use to predict y. 99 of them are just random noise, but one of them is highly correlated with the class label.

>> x = rand(1000,99);
>> x(:,100) = y + rand(1000,1)*0.1;

现在假设我们要使用线性判别分析对点进行分类.如果我们不应用任何特征选择而直接执行此操作,我们将首先将数据拆分为训练集和测试集:

Now let's say we want to classify the points using linear discriminant analysis. If we were to do this directly without applying any feature selection, we would first split the data up into a training set and a test set:

>> xtrain = x(1:700, :); xtest = x(701:end, :);
>> ytrain = y(1:700); ytest = y(701:end);

然后我们将它们分类:

>> ypred = classify(xtest, xtrain, ytrain);

最后,我们将测量预测的错误率:

And finally we would measure the error rate of the prediction:

>> sum(ytest ~= ypred)
ans =
     0

在这种情况下,我们得到了完美的分类.

and in this case we get perfect classification.

要使函数句柄与 sequentialfs 一起使用,只需将这些部分放在一起:

To make a function handle to be used with sequentialfs, just put these pieces together:

>> f = @(xtrain, ytrain, xtest, ytest) sum(ytest ~= classify(xtest, xtrain, ytrain));

并将它们一起传递到 sequentialfs 中:

And pass all of them together into sequentialfs:

>> fs = sequentialfs(f,x,y)
fs =
  Columns 1 through 16
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 17 through 32
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 33 through 48
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 49 through 64
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 65 through 80
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 81 through 96
     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
  Columns 97 through 100
     0     0     0     1

输出中最后的 1 表明变量 100 是 x 中变量中 y 的最佳预测器,正如预期的那样.

The final 1 in the output indicates that variable 100 is, as expected, the best predictor of y among the variables in x.

sequentialfs 文档中的例子稍微复杂一些,主要是因为预测的类标签是字符串而不是上面的数值,所以使用了 ~strcmp计算错误率而不是 ~=.此外,它利用交叉验证来估计错误率,而不是如上所述的直接评估.

The example in the documentation for sequentialfs is a little more complex, mostly because the predicted class labels are strings rather than numerical values as above, so ~strcmp is used to calculate the error rate rather than ~=. In addition it makes use of cross-validation to estimate the error rate, rather than direct evaluation as above.

这篇关于序列特征选择Matlab的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆