R中主要成分的载荷置信区间 [英] Confidence intervals of loadings in principal components in R

查看:118
本文介绍了R中主要成分的载荷置信区间的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下代码通过R中的prcomp函数对虹膜数据集的前4列进行主成分分析:

I am using following code for principal component analysis of first 4 columns of iris data set using prcomp function in R:

> prcomp(iris[1:4])
Standard deviations:
[1] 2.0562689 0.4926162 0.2796596 0.1543862

Rotation:
                     PC1         PC2         PC3        PC4
Sepal.Length  0.36138659 -0.65658877  0.58202985  0.3154872
Sepal.Width  -0.08452251 -0.73016143 -0.59791083 -0.3197231
Petal.Length  0.85667061  0.17337266 -0.07623608 -0.4798390
Petal.Width   0.35828920  0.07548102 -0.54583143  0.7536574

如何获得R中这些值的置信区间?有什么包装可以做到吗?感谢您的帮助.

How can I get confidence intervals of these values in R? Is there any package that can do it? Thanks for your help.

推荐答案

您可以对此使用引导程序.只需使用引导程序包对数据进行重新采样,并记录每次计算的主要成分.使用所得的经验分布来获取您的置信区间.

You could use bootstrapping on this. Simply re-sample your data with the bootstrapping package and record the principal components computed every time. Use the resulting empirical distribution to get your confidence intervals.

boot软件包使此操作非常容易.

The boot package makes this pretty easy.

下面是一个示例,计算相对于Sepal.Length的第一个PCA组件的置信区间为95%:

Here is an example calculating the Confidence Interval at 95% for the first PCA component with respect to Sepal.Length:

library(boot)

getPrcStat <- function (samdf,vname,pcnum){
  prcs <- prcomp(samdf[1:4]) # returns matrix
  return(prcs$rotation[ vname,pcnum ])   # pick out the thing we need
}

bootEst <- function(df,d){
   sampledDf <- df[ d, ]  # resample dataframe 
   return(getPrcStat(sampledDf,"Sepal.Length",1))
}

bootOut <- boot(iris,bootEst,R=10000)
boot.ci(bootOut,type=c("basic"))

输出为:

  BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
  Based on 10000 bootstrap replicates

  CALL : 
  boot.ci(boot.out = bootOut, type = c("basic"))

  Intervals : 
  Level      Basic         
  95%   ( 0.3364,  1.1086 )  
  Calculations and Intervals on Original Scale

因此,使用通常的基本自举方法,我们得到的95%置信区间在0.3364和1.1086之间.还有许多其他更高级的统计方法也可以使用,但是您需要知道自己在做什么.

So using the usual basic bootstrap method we get a 95 percent confidence interval of between 0.3364 and 1.1086. There are plenty of other more advanced statistical methods that can be used too, but you need to know what you are doing.

这篇关于R中主要成分的载荷置信区间的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆