在R中使用PCA删除变量 [英] Removing Variables using PCA in R

查看:309
本文介绍了在R中使用PCA删除变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试搜索此内容,但找不到信息.我正在使用10个变量(1个y变量和9个x变量)进行线性回归.所有变量都相关.我想看看是否需要全部9个变量.如何使用PCA中的数据消除变量? 我使用prcomp()对所有10个变量进行了PCA,并获得了以下结果:

I tried searching for this but could not find the info. I am conducting a linear regression using 10 variables (1 y variable and 9 x variables). All the variables are correlated. I want to see if I need all 9 variables or not. How do I use the data from PCA to eliminate variables? I conducted PCA on all 10 variables using prcomp() and got the following results:

Importance of components:
                          PC1     PC2     PC3     PC4     PC5     PC6     PC7     PC8     PC9     PC10
Standard deviation     0.1021 0.04005 0.03464 0.03114 0.02414 0.02047 0.01708 0.01425 0.01308 0.003287
Proportion of Variance 0.6567 0.10101 0.07555 0.06104 0.03668 0.02639 0.01838 0.01278 0.01078 0.000680
Cumulative Proportion  0.6567 0.75773 0.83328 0.89432 0.93100 0.95738 0.97576 0.98854 0.99932 1.000000

Rotation:
               PC1          PC2         PC3         PC4         PC5         PC6         PC7         PC8          PC9         PC10
 [1,] -0.219033940  0.009323363  0.14371969  0.06987706  0.19302513 -0.02648874  0.16654618 -0.06567080 -0.925393447  0.005948459
 [2,] -0.007661133 -0.027804546 -0.24045564  0.13997803  0.00461297 -0.13195868  0.13625008  0.05140013 -0.005668700 -0.939724900
 [3,] -0.053184446 -0.212036806 -0.26744318  0.36220366 -0.53094911  0.24356319 -0.04692857 -0.62944042 -0.084900337  0.051564259
 [4,] -0.188804651  0.062154139 -0.08807850  0.18886008  0.19969440 -0.59987987 -0.68882923 -0.20548388 -0.004509710  0.024501524
 [5,] -0.299789863  0.080676352 -0.62720621 -0.23335343  0.37274825  0.50767975 -0.23796461  0.03549668 -0.025233090  0.023917725
 [6,] -0.013478134 -0.052386807 -0.58015768  0.34394876 -0.01276741 -0.38994226  0.42009710  0.31887185  0.002157408  0.334375266
 [7,] -0.380565266  0.227200067  0.23992808  0.40306010  0.46135693  0.09059073  0.35930614 -0.34019038  0.342613874  0.015991214
 [8,] -0.432463682  0.037822199  0.20765408  0.45337044 -0.30497494  0.26299209 -0.26947304  0.57196490  0.008807625 -0.029461460
 [9,] -0.654931547  0.158646794 -0.01629962 -0.51083458 -0.39357245 -0.27198634  0.20326283 -0.08572653  0.083798804 -0.010738521
[10,] -0.250287731 -0.928894500  0.10639604 -0.08339656  0.20266163 -0.03955488  0.02948133  0.03827340  0.106117791  0.002154660

推荐答案

所以听起来您正面临模型选择问题,您想选择最佳变量而不过度拟合吗?

So it sounds like you are facing a model selection problem, you want to choose the best variables without overfitting correct?

PCA可能不是选择功能的方法,这是对它的一种讨论:

PCA may not be the way to go for feature selection, here's one discussion of it:

https://stats.stackexchange.com/questions/27300/using-pca功能选择

PCA的通常目的是减少维度,即使用比实际存在的维度更少的维度来描述数据中的关系.解释很多差异的组件可能是一个好功能,但不一定,因为它并非完全针对此目的.

The usual purpose of PCA is dimensionality reduction, i.e. describing relationships in your data using fewer dimensions than are actually present. A component that explains a lot of variance could be a good feature but not necessarily, its not exactly geared towards that purpose.

如果您要减少模型中的特征数量,我建议使用诸如

If what you want to do is pare down the number of features in your model, I would suggest using an information criterion like the AIC. You can easily use this is R with the stepAIC function like so:

library(MASS)
fit = lm(Sepal.Length ~ .^2,data=iris)
step <- stepAIC(fit, direction="backward")
step$anova
>> Stepwise Model Path 
>> Analysis of Deviance Table
>> 
>> Initial Model:
>> Sepal.Length ~ (Sepal.Width + Petal.Length + Petal.Width + Species)^2
>> 
>> Final Model:
>> Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species + 
>>   Sepal.Width:Petal.Width + Petal.Length:Species + Petal.Width:Species

在每个步骤中,它都会修剪另一个功能,以最大程度地减少AIC.选择模型还有很多事情要做,还有很多事情要考虑和调整,因此这不是说明性指南,只是想将其作为要考虑的问题.

At each step it trims out another feature, minimizing on AIC. There is a lot more that goes into model selection, and a lot of things to consider and adjust, so this is not a proscriptive guide, just wanted to bring it up as something to consider.

这篇关于在R中使用PCA删除变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆