将主要组件作为变量添加到数据框 [英] Adding principal components as variables to a data frame

查看:90
本文介绍了将主要组件作为变量添加到数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用R中10000个数据点和100个变量的数据集.不幸的是,我拥有的变量不能很好地描述数据.我使用prcomp()进行了PCA分析,前3台PC似乎占了数据的大部分可变性.据我了解,主要成分是不同变量的组合.因此,它具有与每个数据点相对应的特定值,可以视为新变量.我可以将这些主要成分作为3个新变量添加到我的数据中吗?我需要它们进行进一步的分析.

I am working with a dataset of 10000 data points and 100 variables in R. Unfortunately the variables I have do not describe the data in a good way. I carried out a PCA analysis using prcomp() and the first 3 PCs seem to account for a most of the variability of the data. As far as I understand, a principal component is a combination of different variables; therefore it has a certain value corresponding to each data point and can be considered as a new variable. Would I be able to add these principal components as 3 new variables to my data? I would need them for further analysis.

可重现的数据集:

set.seed(144)
x <- data.frame(matrix(rnorm(2^10*12), ncol=12))
y <- prcomp(formula = ~., data=x, center = TRUE, scale = TRUE, na.action = na.omit)

推荐答案

PC分数存储在prcomp()结果的元素x中.

PC scores are stored in the element x of prcomp() result.

str(y)

List of 6
 $ sdev    : num [1:12] 1.08 1.06 1.05 1.04 1.03 ...
 $ rotation: num [1:12, 1:12] -0.0175 -0.1312 0.3284 -0.4134 0.2341 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:12] "X1" "X2" "X3" "X4" ...
  .. ..$ : chr [1:12] "PC1" "PC2" "PC3" "PC4" ...
 $ center  : Named num [1:12] 0.02741 -0.01692 -0.03228 -0.03303 0.00122 ...
  ..- attr(*, "names")= chr [1:12] "X1" "X2" "X3" "X4" ...
 $ scale   : Named num [1:12] 0.998 1.057 1.019 1.007 0.993 ...
  ..- attr(*, "names")= chr [1:12] "X1" "X2" "X3" "X4" ...
 $ x       : num [1:1024, 1:12] 1.023 -1.213 0.167 -0.118 -0.186 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:1024] "1" "2" "3" "4" ...
  .. ..$ : chr [1:12] "PC1" "PC2" "PC3" "PC4" ...
 $ call    : language prcomp(formula = ~., data = x, na.action = na.omit, center = TRUE, scale = TRUE)
 - attr(*, "class")= chr "prcomp"

您可以使用y$x获取它们,然后选择所需的列.

You can get them with y$x and then chose those columns you need.

x.new<-cbind(x,y$x[,1:3])
str(x.new)

'data.frame':   1024 obs. of  15 variables:
 $ X1 : num  1.14 2.38 0.684 1.785 0.313 ...
 $ X2 : num  -0.689 0.446 -0.72 -3.511 0.36 ...
 $ X3 : num  0.722 0.816 0.295 -0.48 0.566 ...
 $ X4 : num  1.629 0.738 0.85 1.057 0.116 ...
 $ X5 : num  -0.737 -0.827 0.65 -0.496 -1.045 ...
 $ X6 : num  0.347 0.056 -0.606 1.077 0.257 ...
 $ X7 : num  -0.773 1.042 2.149 -0.599 0.516 ...
 $ X8 : num  2.05511 0.4772 0.18614 0.02585 0.00619 ...
 $ X9 : num  -0.0462 1.3784 -0.2489 0.1625 0.6137 ...
 $ X10: num  -0.709 0.755 0.463 -0.594 -1.228 ...
 $ X11: num  -1.233 -0.376 -2.646 1.094 0.207 ...
 $ X12: num  -0.44 -2.049 0.315 0.157 2.245 ...
 $ PC1: num  1.023 -1.213 0.167 -0.118 -0.186 ...
 $ PC2: num  1.2408 0.6077 1.1885 3.0789 0.0797 ...
 $ PC3: num  -0.776 -1.41 0.977 -1.343 0.987 ...

这篇关于将主要组件作为变量添加到数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆