在使用R进行PCA分析之前省略NA和数据插补 [英] Omit NA and data imputation before doing PCA analysis using R
问题描述
我正在尝试使用 princomp
函数在R中进行PCA分析。
以下是示例代码:
mydf< - data.frame(
A = c(NA,rnorm(10, ,
B = c(NA,rnorm(9,4,5),NA),
C = c(NA,NA,rnorm ,4,5),NA)
)
out< - princomp(mydf,cor = TRUE,na.action = na.exclude)
cov.wt(z)中的错误:'x'必须包含有限值
我试图从数据集中删除 NA
,但不起作用。
ndnew< - mydf [complete.cases(mydf),]
ABC
1 NA NA NA
2 1.67558617743171 1.28714736288378 NA
3分配-1.03388645096478 9.8370942023751 10.9522215389562
4分配7.10494481721949 4.06560213642725 14.7686678743866
5分配13.966212462717 3.92061729913733 7.12875100279949
6分配-1.91566982754146 5.26042516598668 0.842774330179978
7分配0.0974919570675357 5.5264365812476 6.30783046905425
8分配12.7384749395121 4.72439301946042 2.9318845479507
9分配13.1859349108349 -0.546676530952666 9.98938028956806
10 4.97278207223239 6.95942086859593 5.15901566720956
11分配-4.10115142119221 NA NA
即使我可以删除 NA
,它可能没有帮助,因为每行或列至少有一个缺失值。有没有R方法可以归因于PCA分析的数据?
更新:根据答案:
> mydf< - data.frame(A = c(NA,rnorm(10,4,5)),B = c(NA,rnorm(9,4,5),NA),
pre>
+ C = c (NA,NA,rnorm(8,4,5),NA))
>输出< - princomp(mydf,cor = TRUE,na.action = na.exclude)
cov.wt(z)中的错误:'x'必须包含有限值
ndnew < - mydf [complete.cases(mydf)]]
out< - princomp(ndnew,cor = TRUE,na.action = na.exclude)
这样做有效,但是
na.action
不起作用。
是否有任何可以估算数据的方法,在实际数据中,几乎每个列都缺少值?
NA
省略的结果将给我〜0行或列。解决方案对于
na.action
有效果,您需要明确提供公式
参数:princomp(formula =〜。,data = mydf,cor = TRUE,na.action = na.exclude)
#调用:
#princomp(formula =〜。,data = mydf,na.action = na.exclude,cor = TRUE)
#
#标准偏差:
#Comp.1 Comp.2 Comp.3
#1.3748310 0.8887105 0.5657149
公式是需要,因为它触发调用
princomp.formula
,唯一的princomp
方法对na.action
。方法('princomp')
[1] princomp.default * princomp.formula *
名称(formals(stats ::: princomp.formula))
[1]公式数据子集na.action...
name(formals(stats ::: princomp.default))
[1]xcorscorescovmat子集...
I am trying to do PCA analysis using
princomp
function in R.The following is the example code:
mydf <- data.frame ( A = c("NA", rnorm(10, 4, 5)), B = c("NA", rnorm(9, 4, 5), "NA"), C = c("NA", "NA", rnorm(8, 4, 5), "NA") ) out <- princomp(mydf, cor = TRUE, na.action=na.exclude) Error in cov.wt(z) : 'x' must contain finite values only
I tried to remove the
NA
from the dataset, but it does not work.ndnew <- mydf[complete.cases(mydf),] A B C 1 NA NA NA 2 1.67558617743171 1.28714736288378 NA 3 -1.03388645096478 9.8370942023751 10.9522215389562 4 7.10494481721949 14.7686678743866 4.06560213642725 5 13.966212462717 3.92061729913733 7.12875100279949 6 -1.91566982754146 0.842774330179978 5.26042516598668 7 0.0974919570675357 5.5264365812476 6.30783046905425 8 12.7384749395121 4.72439301946042 2.9318845479507 9 13.1859349108349 -0.546676530952666 9.98938028956806 10 4.97278207223239 6.95942086859593 5.15901566720956 11 -4.10115142119221 NA NA
Even if I can remove the
NA
's it might not be of help as every rows or column has at least one missing values. Is there any R method that can impute the data doing PCA analysis?
UPDATE: based on the answers:
> mydf <- data.frame (A = c(NA, rnorm(10, 4, 5)), B = c(NA, rnorm(9, 4, 5), NA), + C = c(NA, NA, rnorm(8, 4, 5), NA)) > out <- princomp(mydf, cor = TRUE, na.action=na.exclude) Error in cov.wt(z) : 'x' must contain finite values only ndnew <- mydf[complete.cases(mydf),] out <- princomp(ndnew, cor = TRUE, na.action=na.exclude)
This works but the defult
na.action
does not work.Is there is any method that can impute the data, as in real data I have almost every column with missing value in them? The result of such
NA
omission will give me ~ 0 rows or columns.解决方案For
na.action
to have an effect, you need to explicitly supply aformula
argument:princomp(formula = ~., data = mydf, cor = TRUE, na.action=na.exclude) # Call: # princomp(formula = ~., data = mydf, na.action = na.exclude, cor = TRUE) # # Standard deviations: # Comp.1 Comp.2 Comp.3 # 1.3748310 0.8887105 0.5657149
The formula is needed because it triggers dispatch of
princomp.formula
, the onlyprincomp
method that does anything useful withna.action
.methods('princomp') [1] princomp.default* princomp.formula* names(formals(stats:::princomp.formula)) [1] "formula" "data" "subset" "na.action" "..." names(formals(stats:::princomp.default)) [1] "x" "cor" "scores" "covmat" "subset" "..."
这篇关于在使用R进行PCA分析之前省略NA和数据插补的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!