在Matlab中计算通用线性模型的交叉验证 [英] Calculate cross validation for Generalized Linear Model in Matlab

查看:165
本文介绍了在Matlab中计算通用线性模型的交叉验证的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用广义线性模型进行回归.crossVal函数使我措手不及.到目前为止,我的执行情况;

I am doing a regression using Generalized Linear Model.I am caught offguard using the crossVal function. My implementation so far;

x = 'Some dataset, containing the input and the output'

X = x(:,1:7);
Y = x(:,8);

cvpart = cvpartition(Y,'holdout',0.3);
Xtrain = X(training(cvpart),:);
Ytrain = Y(training(cvpart),:);
Xtest = X(test(cvpart),:);
Ytest = Y(test(cvpart),:);

mdl = GeneralizedLinearModel.fit(Xtrain,Ytrain,'linear','distr','poisson');

Ypred  = predict(mdl,Xtest);
res = (Ypred - Ytest);
RMSE_test = sqrt(mean(res.^2));

下面的代码用于计算从此链接获得的多项回归的交叉验证.我想要类似的广义线性模型.

The code below is for calculating cross validation for mulitple regression as obtained from this link. I want something similar for Generalized Linear Model.

c = cvpartition(Y,'k',10);
regf=@(Xtrain,Ytrain,Xtest)(Xtest*regress(Ytrain,Xtrain));
cvMse = crossval('mse',X,Y,'predfun',regf)

推荐答案

您可以手动执行交叉验证过程(为每个折页训练模型,预测结果,计算错误,然后报告所有折页的平均值),或者您可以使用 CROSSVAL 函数,该函数将整个过程包装在一个调用中.

You can either perform the cross-validation process manually (training a model for each fold, predict outcome, compute error, then report the average across all folds), or you can use the CROSSVAL function which wraps this whole procedure in a single call.

举个例子,我将首先加载并准备一个数据集(汽车数据集):

To give an example, I will first load and prepare a dataset (a subset of the cars dataset which ships with the Statistics Toolbox):

% load regression dataset
load carsmall
X = [Acceleration Cylinders Displacement Horsepower Weight];
Y = MPG;

% remove instances with missing values
missIdx = isnan(Y) | any(isnan(X),2);
X(missIdx,:) = [];
Y(missIdx) = [];

clearvars -except X Y

选项1

在这里,我们将使用 k-使用 cvpartition (非分层)折叠交叉验证.对于每一折,我们使用训练数据训练 GLM 模型,然后使用该模型预测输出测试数据.接下来,我们为此折叠计算并存储回归均方误差.最后,我们报告了所有分区的平均RMSE.

Option 1

Here we will manually partition the data using k-fold cross-validation using cvpartition (non-stratified). For each fold, we train a GLM model using the training data, then use the model to predict output of testing data. Next we compute and store the regression mean squared error for this fold. At the end, we report the average RMSE across all partitions.

% partition data into 10 folds
K = 10;
cv = cvpartition(numel(Y), 'kfold',K);

mse = zeros(K,1);
for k=1:K
    % training/testing indices for this fold
    trainIdx = cv.training(k);
    testIdx = cv.test(k);

    % train GLM model
    mdl = GeneralizedLinearModel.fit(X(trainIdx,:), Y(trainIdx), ...
        'linear', 'Distribution','poisson');

    % predict regression output
    Y_hat = predict(mdl, X(testIdx,:));

    % compute mean squared error
    mse(k) = mean((Y(testIdx) - Y_hat).^2);
end

% average RMSE across k-folds
avrg_rmse = mean(sqrt(mse))

选项2

在这里,我们可以简单地使用适当的函数句柄调用CROSSVAL,该函数句柄在给定一组训练/测试实例的情况下计算回归输出.请参阅文档页面以了解参数.

Option 2

Here we can simply call CROSSVAL with an appropriate function handle which computes the regression output given a set of train/test instances. See the doc page to understand the parameters.

% prediction function given training/testing instances
fcn = @(Xtr, Ytr, Xte) predict(...
    GeneralizedLinearModel.fit(Xtr,Ytr,'linear','distr','poisson'), ...
    Xte);

% perform cross-validation, and return average MSE across folds
mse = crossval('mse', X, Y, 'Predfun',fcn, 'kfold',10);

% compute root mean squared error
avrg_rmse = sqrt(mse)

与以前相比,您应该获得相似的结果(当然,由于交叉验证所涉及的随机性,因此结果略有不同).

You should get a similar result compared to before (slightly different of course, on account of the randomness involved in the cross-validation).

这篇关于在Matlab中计算通用线性模型的交叉验证的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆