scikit-learn PCA:矩阵变换产生带有翻转符号的PC估计 [英] scikit-learn PCA: matrix transformation produces PC estimates with flipped signs
问题描述
我正在使用scikit-learn在此数据集上执行PCA. scikit-learn文档声明
I'm using scikit-learn to perform PCA on this dataset. The scikit-learn documentation states that
由于实现奇异值分解的微妙之处 (SVD),用于此实现中, 同一矩阵可以导致主成分的符号翻转 (方向改变).因此,务必始终使用 相同的估算器对象,以一致的方式转换数据.
Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.
问题是,我不认为我使用的估算器对象不同,但是与SAS PROC PRINCOMP
过程的结果相比,我的某些PC的信号是翻转的.
The problem is that I don't think that I'm using different estimator objects, but the signs of some of my PCs are flipped, when compared to results in SAS's PROC PRINCOMP
procedure.
对于数据集中的第一个观察结果,SAS PC为:
For the first observation in the dataset, the SAS PCs are:
PC1 PC2 PC3 PC4 PC5
2.0508 1.9600 -0.1663 0.2965 -0.0121
从scikit-learn中,我得到了以下内容(幅度非常接近):
From scikit-learn, I get the following (which are very close in magnitude):
PC1 PC2 PC3 PC4 PC5
-2.0536 -1.9627 -0.1666 -0.297 -0.0122
这就是我在做什么:
import pandas as pd
import numpy as np
from sklearn.decomposition.pca import PCA
sourcef = pd.read_csv('C:/mydata.csv')
frame = pd.DataFrame(sourcef)
# Some pandas evals, regressions, etc... that I'm not showing
# but not affecting the matrix
# Make sure we are working with the proper data -- drop the response variable
cols = [col for col in frame.columns if col not in ['response']]
# Separate out the data matrix from the response variable vector
# into numpy arrays
frame2_X = frame[cols].values
frame2_y = frame['response'].values
# Standardize the values
X_means = np.mean(frame2_X,axis=0)
X_stds = np.std(frame2_X,axis=0)
y_mean = np.mean(frame2_y)
y_std = np.std(frame2_y)
frame2_X_stdz = np.copy(frame2_X)
frame2_y_stdz = frame2_y.astype(numpy.float32, copy=True)
for (x,y), value in np.ndenumerate(frame2_X_stdz):
frame2_X_stdz[x][y] = (value - X_means[y])/X_stds[y]
for index, value in enumerate(frame2_y_stdz):
frame2_y_stdz[index] = (float(value) - y_mean)/y_std
# Show the first 5 elements of the standardized values, to verify
print frame2_X_stdz[:,0][:5]
# Show the first 5 lines from the standardized response vector, to verify
print frame2_y_stdz[:5]
可以结帐了:
[ 0.9508 -0.5847 -0.2797 -0.4039 -0.598 ]
[ 1.0726 -0.5009 -0.0942 -0.1187 -0.8043]
继续...
# Create a PCA object
pca = PCA()
pca.fit(frame2_X_stdz)
# Create the matrix of PC estimates
pca.transform(frame2_X_stdz)
这是最后一步的输出:
Out[16]: array([[-2.0536, -1.9627, -0.1666, -0.297 , -0.0122],
[ 1.382 , -0.382 , -0.5692, -0.0257, -0.0509],
[ 0.4342, 0.611 , 0.2701, 0.062 , -0.011 ],
...,
[ 0.0422, 0.7251, -0.1926, 0.0089, 0.0005],
[ 1.4502, -0.7115, -0.0733, 0.0013, -0.0557],
[ 0.258 , 0.3684, 0.1873, 0.0403, 0.0042]])
我尝试用pca.fit_transform()
替换pca.fit()
和pca.transform()
,但是最终得到了相同的结果.
I've tried it by replacing the pca.fit()
and pca.transform()
with pca.fit_transform()
, but I end up with the same results.
在我的PC上出现标志翻转的情况下,我在做什么错了?
What am I doing wrong here that I'm getting PCs with the signs flipped?
推荐答案
您没有做错任何事情.
文档警告您的是,重复调用fit
可能会产生不同的主要成分-而不是它们与另一个PCA实现的关系.
What the documentation is warning you about is that repeated calls to fit
may yield different principal components - not how they relate to another PCA implementation.
在所有组件上带有翻转符号不会使结果错误-只要满足定义,结果就正确(选择每个组件以捕获最大方差)在数据中).就目前情况而言,您所获得的投影似乎只是被镜像了-它仍然满足定义,因此是正确的.
Having a flipped sign on all components doesn't make the result wrong - the result is right as long as it fulfills the definition (each component is chosen such that it captures the maximum amount of variance in the data). As it stands, it seems the projection you got is simply mirrored - it still fulfills the definition, and is, thus, correct.
如果在正确性之下,您担心实现之间的一致性,则可以在必要时将组成部分乘以-1.
If, beneath correctness, you're worried about consistency between implementations, you can simply multiply the components by -1, when it's necessary.
这篇关于scikit-learn PCA:矩阵变换产生带有翻转符号的PC估计的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!