在sklearn.decomposition.PCA中，为什么components_为负数? [英] In sklearn.decomposition.PCA, why are components_ negative?

查看：210 发布时间：2020/5/18 18:58:25 python python-3.x numpy scikit-learn pca

本文介绍了在sklearn.decomposition.PCA中，为什么components_为负数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正努力跟着Abdi& Williams-主成分分析(2010年)，并通过SVD使用 numpy.linalg.svd .

I'm trying to follow along with Abdi & Williams - Principal Component Analysis (2010) and build principal components through SVD, using numpy.linalg.svd.

当我显示 components_ 属性时从装有sklearn的PCA中，它们的幅度与我手动计算的幅度完全相同，但是某些(不是全部)的符号相反.是什么原因造成的?

When I display the components_ attribute from a fitted PCA with sklearn, they're of the exact same magnitude as the ones that I've manually computed, but some (not all) are of opposite sign. What's causing this?

更新:我下面的(部分)答案包含一些其他信息.

Update: my (partial) answer below contains some additional info.

获取以下示例数据:

from pandas_datareader.data import DataReader as dr
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

# sample data - shape (20, 3), each column standardized to N~(0,1)
rates = scale(dr(['DGS5', 'DGS10', 'DGS30'], 'fred', 
           start='2017-01-01', end='2017-02-01').pct_change().dropna())

# with sklearn PCA:
pca = PCA().fit(rates)
print(pca.components_)
[[-0.58365629 -0.58614003 -0.56194768]
 [-0.43328092 -0.36048659  0.82602486]
 [-0.68674084  0.72559581 -0.04356302]]

# compare to the manual method via SVD:
u, s, Vh = np.linalg.svd(np.asmatrix(rates), full_matrices=False)
print(Vh)
[[ 0.58365629  0.58614003  0.56194768]
 [ 0.43328092  0.36048659 -0.82602486]
 [-0.68674084  0.72559581 -0.04356302]]

# odd: some, but not all signs reversed
print(np.isclose(Vh, -1 * pca.components_))
[[ True  True  True]
 [ True  True  True]
 [False False False]]

推荐答案

正如您在答案中指出的那样，就奇异向量而言，奇异值分解(SVD)的结果并非唯一.实际上，如果X的SVD为\ sum_1 ^ r \ s_i u_i v_i ^ \ top:

As you figured out in your answer, the results of a singular value decomposition (SVD) are not unique in terms of singular vectors. Indeed, if the SVD of X is \sum_1^r \s_i u_i v_i^\top :

以s_i递减的顺序排列，那么您会看到可以更改说出u_1和v_1的符号(即翻转")，减号将取消，因此公式仍然成立.

with the s_i ordered in decreasing fashion, then you can see that you can change the sign (i.e., "flip") of say u_1 and v_1, the minus signs will cancel so the formula will still hold.

这表明SVD是唯一的，直到左右奇异矢量对中的符号发生变化为止.

This shows that the SVD is unique up to a change in sign in pairs of left and right singular vectors.

由于PCA只是X的SVD(或X ^ \ top X的特征值分解)，因此不能保证每次执行时都不会在同一X上返回不同的结果.可以理解，scikit学习实现希望避免这种情况:通过强制(任意)绝对值u_i的最大系数为正，来保证返回(存储在U和V中)的左右奇异向量始终相同.

Since the PCA is just a SVD of X (or an eigenvalue decomposition of X^\top X), there is no guarantee that it does not return different results on the same X every time it is performed. Understandably, scikit learn implementation wants to avoid this: they guarantee that the left and right singular vectors returned (stored in U and V) are always the same, by imposing (which is arbitrary) that the largest coefficient of u_i in absolute value is positive.

您可以看到阅读 the来源:首先，他们使用linalg.svd()计算U和V.然后，对于每个向量u_i(即U的行)，如果其绝对值中的最大元素为正，则它们将不执行任何操作.否则，它们将u_i更改为-u_i，并将相应的左奇异矢量v_i更改为-v_i.如前所述，这不会更改SVD公式，因为减号会被抵消.但是，现在可以保证在此处理后返回的U和V始终相同，因为符号上的不确定性已被删除.

As you can see reading the source: first they compute U and V with linalg.svd(). Then, for each vector u_i (i.e, row of U), if its largest element in absolute value is positive, they don't do anything. Otherwise, they change u_i to - u_i and the corresponding left singular vector, v_i, to - v_i. As told earlier, this does not change the SVD formula since the minus sign cancel out. However, now it is guaranteed that the U and V returned after this processing are always the same, since the indetermination on the sign has been removed.

这篇关于在sklearn.decomposition.PCA中，为什么components_为负数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在sklearn.decomposition.PCA中，为什么components_为负数? [英] In sklearn.decomposition.PCA, why are components_ negative?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在sklearn.decomposition.PCA中，为什么components_为负数? [英] In sklearn.decomposition.PCA, why are components_ negative?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭