如何从事实创造因素? [英] How to create factors from factanal?

查看:22
本文介绍了如何从事实创造因素?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当使用 factanal 进行因子分析时,通常的结果是一些载荷表加上一些其他信息.有没有直接的方法可以使用这些载荷来创建矩阵/data.frame 的因素?例如,稍后在回归分析中使用它们.

When performing a factor analysis using factanal the usual result is some loadings table plus several other information. Is there a direct way to use these loadings to create a matrix / data.frame of factors? For example to use them in regression analysis later on.

这样做的目的是获得用于后续建模的变量.我只知道因子分数——但欢迎提供其他术语的建议/指针:)

the purpose of this is to obtain variables for subsequent modeling. I only know of factor scores – but suggestions / pointers to other terminology are welcome :)

Joris Meys 的回答基本上就是我所要求的.尽管如此,它仍将我的问题转向可能更适合 statsoverflow 的方向,但我现在将其保留在这里,因为合适的人群正在讨论解决方案:

Joris Meys answer answer is basically what I was asking for. Still though it moves my question towards a direction that might be better suited for statsoverflow, but I will keep it here for now, because the right group of people is the discussing the solution:

基于回归的分数有什么好处?产品 (ML) 的结果与因素高度相关...老实说,我想知道为什么我的情况差异如此之大?

What´s the benefit of the regression based scores? The result of the product (ML) is highly correlated with the factors... Honestly I wonder why the difference is that big in my case?

 fa$scores # the correct solution
 fac <- m1 %*% loadings(fa) # the answer on your question
 diag(cor(fac,fa$scores))
 #returns:
Factor1   Factor2   Factor3 
0.8309343 0.8272019 0.8070837 

推荐答案

您询问了如何使用加载来构建乐谱.您的解决方案虽然正确,但不这样做.它使用回归方法(或者您也可以使用 Bartlett 的方法),这使用了分数不相关、以 0 为中心且方差 = 1 的限制.因此,这些因素与使用 F 获得的因素不同= ML,其中 F 为因子矩阵,M 为原始矩阵,L 为加载矩阵.

You asked how to use the loadings for construction of scores. Your solution is, although correct, not doing that. It's using a regression method (alternatively you can use Bartlett's method as well), and this uses the restriction that the scores are uncorrelated, centered around 0 and with variance = 1. These are hence not the same factors as one would obtain by using F = ML with F the factor matrix, M the original matrix and L the loading matrix.

使用帮助文件中的示例进行演示:

A demonstration with the example from the help files :

v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4)
v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4)
m1 <- cbind(v1,v2,v3,v4,v5,v6)

fa <- factanal(m1, factors=3,scores="regression")

fa$scores # the correct solution

fac <- m1 %*% loadings(fa) # the answer on your question

这些显然是不同的值.

这与 Thomson 回归分数基于缩放变量并考虑相关矩阵的事实有关.如果您要手动计算分数,您会这样做:

Edit : This has to do with the fact that the Thomson regression scores are based on scaled variables, and take the correlation matrix into account. If you would calculate the scores by hand, you'd do :

> fac2 <- scale(m1) %*% solve(cor(m1)) %*% loadings(fa)
> all.equal(fa$scores,as.matrix(fac2))
[1] TRUE

有关更多信息,请参阅 这篇评论

For more information, see this review

并向您展示为什么它很重要:如果您以天真"的方式计算分数,则您的分数实际上是相关的.这就是您首先想要摆脱的:

And to show you why it is important : If you calculate the scores the "naive" way, your scores are actually correlated. And that is what you wanted to get rid of in the first place :

> round(cor(fac),2)
        Factor1 Factor2 Factor3
Factor1    1.00    0.79    0.81
Factor2    0.79    1.00    0.82
Factor3    0.81    0.82    1.00

> round(cor(fac2),2)
        Factor1 Factor2 Factor3
Factor1       1       0       0
Factor2       0       1       0
Factor3       0       0       1

这篇关于如何从事实创造因素?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆