使用 sklearn 的因子载荷 [英] Factor Loadings using sklearn
问题描述
我想要python中各个变量和主成分之间的相关性.我在 sklearn 中使用 PCA.我不明白分解数据后如何实现加载矩阵?我的代码在这里.
I want the correlations between individual variables and principal components in python. I am using PCA in sklearn. I don't understand how can I achieve the loading matrix after I have decomposed my data? My code is here.
iris = load_iris()
data, y = iris.data, iris.target
pca = PCA(n_components=2)
transformed_data = pca.fit(data).transform(data)
eigenValues = pca.explained_variance_ratio_
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html 没有提到如何实现这一点.
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html doesn't mention how this can be achieved.
推荐答案
我认为 @RickardSjogren 是在描述特征向量,而 @BigPanda 是在给出载荷.有一个很大的区别:Loadings vs eigenvectors在 PCA 中:何时使用一种或另一种?.
I think that @RickardSjogren is describing the eigenvectors, while @BigPanda is giving the loadings. There's a big difference: Loadings vs eigenvectors in PCA: when to use one or another?.
我创建了这个PCA类加载方法.
I created this PCA class with a loadings
method.
加载,由 pca.components_ * np.sqrt(pca.explained_variance_)
给出,更类似于多元线性回归中的系数.我在这里不使用 .T
因为在上面链接的 PCA 类中,组件已经转置了.numpy.linalg.svd
产生 u, s, and vt
,其中 vt
是 Hermetian 转置,所以你首先需要回到 v
和 vt.T
.
Loadings, as given by pca.components_ * np.sqrt(pca.explained_variance_)
, are more analogous to coefficients in a multiple linear regression. I don't use .T
here because in the PCA class linked above, the components are already transposed. numpy.linalg.svd
produces u, s, and vt
, where vt
is the Hermetian transpose, so you first need to back into v
with vt.T
.
还有一个重要的细节:sklearn.PCA
中组件和加载的符号(正/负)可能与 R 等软件包不同.更多相关信息:
There is also one other important detail: the signs (positive/negative) on the components and loadings in sklearn.PCA
may differ from packages such as R.
More on that here:
在 sklearn.decomposition.PCA 中,为什么 components_ 是负的?.
这篇关于使用 sklearn 的因子载荷的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!