在使用R进行PCA分析之前省略NA和数据插补 [英] Omit NA and data imputation before doing PCA analysis using R

查看：2254 发布时间：2017/3/26 1:44:42 r dataframe pca na princomp

本文介绍了在使用R进行PCA分析之前省略NA和数据插补的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 princomp 函数在R中进行PCA分析。

以下是示例代码：

  mydf<  -  data.frame（
 A = c（NA，rnorm（10， ，
 B = c（NA，rnorm（9，4，5），NA），
 C = c（NA，NA，rnorm ，4，5），NA）
）
 
 out<  -  princomp（mydf，cor = TRUE，na.action = na.exclude）
 
 cov.wt（z）中的错误：'x'必须包含有限值

我试图从数据集中删除 NA ，但不起作用。

  ndnew<  -  mydf [complete.cases（mydf），] 
 
 ABC 
 1 NA NA NA 
 2 1.67558617743171 1.28714736288378 NA 
 3分配-1.03388645096478 9.8370942023751 10.9522215389562 
 4分配7.10494481721949 4.06560213642725 14.7686678743866 
 5分配13.966212462717 3.92061729913733 7.12875100279949 
 6分配-1.91566982754146 5.26042516598668 0.842774330179978 
 7分配0.0974919570675357 5.5264365812476 6.30783046905425 
 8分配12.7384749395121 4.72439301946042 2.9318845479507 
 9分配13.1859349108349 -0.546676530952666 9.98938028956806 
 10 4.97278207223239 6.95942086859593 5.15901566720956 
 11分配-4.10115142119221 NA NA

即使我可以删除 NA ，它可能没有帮助，因为每行或列至少有一个缺失值。有没有R方法可以归因于PCA分析的数据？

更新：根据答案：

 > mydf<  -  data.frame（A = c（NA，rnorm（10,4,5）），B = c（NA，rnorm（9,4,5），NA），
 + C = c （NA，NA，rnorm（8，4，5），NA））
>输出<  -  princomp（mydf，cor = TRUE，na.action = na.exclude）
 cov.wt（z）中的错误：'x'必须包含有限值
 
 ndnew <  -  mydf [complete.cases（mydf）]] 
 out<  -  princomp（ndnew，cor = TRUE，na.action = na.exclude）
  pre> 
 
 这样做有效，但是 na.action 不起作用。
 
 
 是否有任何可以估算数据的方法，在实际数据中，几乎每个列都缺少值？  NA 省略的结果将给我〜0行或列。
解决方案
对于 na.action 有效果，您需要明确提供公式参数：
  princomp（formula =〜。，data = mydf，cor = TRUE，na.action = na.exclude）
 
＃调用：
＃princomp（formula =〜。，data = mydf，na.action = na.exclude，cor = TRUE）
＃
＃标准偏差：
 ＃Comp.1 Comp.2 Comp.3 
＃1.3748310 0.8887105 0.5657149 
  
公式是需要，因为它触发调用 princomp.formula ，唯一的 princomp 方法对 na.action 。 
 方法（'princomp'）
 [1] princomp.default * princomp.formula * 
 
名称（formals（stats ::: princomp.formula））
 [1]公式数据子集na.action...
 
 name（formals（stats ::: princomp.default））
 [1]xcorscorescovmat子集...
  
 
I am trying to do PCA analysis using princomp function in R. 

The following is the example code:
mydf <- data.frame (
    A = c("NA", rnorm(10, 4, 5)), 
    B = c("NA", rnorm(9, 4, 5), "NA"),
    C =  c("NA", "NA", rnorm(8, 4, 5), "NA")
)

out <- princomp(mydf, cor = TRUE, na.action=na.exclude)

Error in cov.wt(z) : 'x' must contain finite values only
I tried to remove the NA from the dataset, but it does not work.  
ndnew <- mydf[complete.cases(mydf),]

                   A                  B                C
1                  NA                 NA               NA
2    1.67558617743171   1.28714736288378               NA
3   -1.03388645096478    9.8370942023751 10.9522215389562
4    7.10494481721949   14.7686678743866 4.06560213642725
5     13.966212462717   3.92061729913733 7.12875100279949
6   -1.91566982754146  0.842774330179978 5.26042516598668
7  0.0974919570675357    5.5264365812476 6.30783046905425
8    12.7384749395121   4.72439301946042  2.9318845479507
9    13.1859349108349 -0.546676530952666 9.98938028956806
10   4.97278207223239   6.95942086859593 5.15901566720956
11  -4.10115142119221                 NA               NA
Even if I can remove the NA's it might not be of help as every rows or column has at least one missing values. Is there any R method that can impute the data doing PCA analysis?



UPDATE: based on the answers:
> mydf <- data.frame (A = c(NA, rnorm(10, 4, 5)), B = c(NA, rnorm(9, 4, 5), NA),
+  C =  c(NA, NA, rnorm(8, 4, 5), NA))
> out <- princomp(mydf, cor = TRUE, na.action=na.exclude)
Error in cov.wt(z) : 'x' must contain finite values only

ndnew <- mydf[complete.cases(mydf),]
out <- princomp(ndnew, cor = TRUE, na.action=na.exclude)
This works but the defult na.action does not work.

Is there is any method that can impute the data, as in real data I have almost every column with missing value in them?  The result of such NA omission will give me ~ 0 rows or columns.
 解决方案 
For na.action to have an effect, you need to explicitly supply a formula argument:
princomp(formula = ~., data = mydf, cor = TRUE, na.action=na.exclude)

# Call:
# princomp(formula = ~., data = mydf, na.action = na.exclude, cor = TRUE)
# 
# Standard deviations:
#    Comp.1    Comp.2    Comp.3 
# 1.3748310 0.8887105 0.5657149 
The formula is needed because it triggers dispatch of princomp.formula, the only princomp method that does anything useful with na.action.  
methods('princomp')
[1] princomp.default* princomp.formula*

names(formals(stats:::princomp.formula))
[1] "formula"   "data"      "subset"    "na.action" "..."  

names(formals(stats:::princomp.default))
[1] "x"      "cor"    "scores" "covmat" "subset" "..."   


                        
这篇关于在使用R进行PCA分析之前省略NA和数据插补的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在使用R进行PCA分析之前省略NA和数据插补 [英] Omit NA and data imputation before doing PCA analysis using R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在使用R进行PCA分析之前省略NA和数据插补 [英] Omit NA and data imputation before doing PCA analysis using R

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭