scikit-learn PCA:矩阵变换产生带有翻转符号的PC估计 [英] scikit-learn PCA: matrix transformation produces PC estimates with flipped signs

查看：121 发布时间：2020/7/31 4:06:59 python scikit-learn pca

本文介绍了scikit-learn PCA:矩阵变换产生带有翻转符号的PC估计的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用scikit-learn在此数据集上执行PCA. scikit-learn文档声明

I'm using scikit-learn to perform PCA on this dataset. The scikit-learn documentation states that

由于实现奇异值分解的微妙之处 (SVD)，用于此实现中，同一矩阵可以导致主成分的符号翻转 (方向改变).因此，务必始终使用相同的估算器对象，以一致的方式转换数据.

Due to implementation subtleties of the Singular Value Decomposition (SVD), which is used in this implementation, running fit twice on the same matrix can lead to principal components with signs flipped (change in direction). For this reason, it is important to always use the same estimator object to transform data in a consistent fashion.

问题是，我不认为我使用的估算器对象不同，但是与SAS PROC PRINCOMP过程的结果相比，我的某些PC的信号是翻转的.

The problem is that I don't think that I'm using different estimator objects, but the signs of some of my PCs are flipped, when compared to results in SAS's PROC PRINCOMP procedure.

对于数据集中的第一个观察结果，SAS PC为:

For the first observation in the dataset, the SAS PCs are:

PC1      PC2      PC3       PC4      PC5
2.0508   1.9600   -0.1663   0.2965   -0.0121

从scikit-learn中，我得到了以下内容(幅度非常接近):

From scikit-learn, I get the following (which are very close in magnitude):

PC1      PC2      PC3       PC4      PC5
-2.0536  -1.9627  -0.1666   -0.297   -0.0122

这就是我在做什么:

import pandas as pd
import numpy  as np
from sklearn.decomposition.pca import PCA

sourcef = pd.read_csv('C:/mydata.csv')
frame = pd.DataFrame(sourcef)

# Some pandas evals, regressions, etc... that I'm not showing
# but not affecting the matrix

# Make sure we are working with the proper data -- drop the response variable
cols = [col for col in frame.columns if col not in ['response']]

# Separate out the data matrix from the response variable vector 
# into numpy arrays
frame2_X = frame[cols].values
frame2_y = frame['response'].values

# Standardize the values
X_means = np.mean(frame2_X,axis=0)
X_stds  = np.std(frame2_X,axis=0)

y_mean = np.mean(frame2_y)
y_std  = np.std(frame2_y)

frame2_X_stdz = np.copy(frame2_X)
frame2_y_stdz = frame2_y.astype(numpy.float32, copy=True)

for (x,y), value in np.ndenumerate(frame2_X_stdz):
    frame2_X_stdz[x][y] = (value - X_means[y])/X_stds[y]

for index, value in enumerate(frame2_y_stdz):
    frame2_y_stdz[index] = (float(value) - y_mean)/y_std

# Show the first 5 elements of the standardized values, to verify
print frame2_X_stdz[:,0][:5]

# Show the first 5 lines from the standardized response vector, to verify
print frame2_y_stdz[:5]

可以结帐了:

[ 0.9508 -0.5847 -0.2797 -0.4039 -0.598 ]
[ 1.0726 -0.5009 -0.0942 -0.1187 -0.8043]

继续...

# Create a PCA object
pca = PCA()
pca.fit(frame2_X_stdz)

# Create the matrix of PC estimates
pca.transform(frame2_X_stdz)

这是最后一步的输出:

Out[16]: array([[-2.0536, -1.9627, -0.1666, -0.297 , -0.0122],
       [ 1.382 , -0.382 , -0.5692, -0.0257, -0.0509],
       [ 0.4342,  0.611 ,  0.2701,  0.062 , -0.011 ],
       ..., 
       [ 0.0422,  0.7251, -0.1926,  0.0089,  0.0005],
       [ 1.4502, -0.7115, -0.0733,  0.0013, -0.0557],
       [ 0.258 ,  0.3684,  0.1873,  0.0403,  0.0042]])

我尝试用pca.fit_transform()替换pca.fit()和pca.transform()，但是最终得到了相同的结果.

I've tried it by replacing the pca.fit() and pca.transform() with pca.fit_transform(), but I end up with the same results.

在我的PC上出现标志翻转的情况下，我在做什么错了?

What am I doing wrong here that I'm getting PCs with the signs flipped?

scikit-learn PCA:矩阵变换产生带有翻转符号的PC估计 [英] scikit-learn PCA: matrix transformation produces PC estimates with flipped signs

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

scikit-learn PCA:矩阵变换产生带有翻转符号的PC估计 [英] scikit-learn PCA: matrix transformation produces PC estimates with flipped signs

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭