在sklearn.decomposition.PCA中,为什么components_为负数? [英] In sklearn.decomposition.PCA, why are components_ negative?

查看:210
本文介绍了在sklearn.decomposition.PCA中,为什么components_为负数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正努力跟着Abdi& Williams-主成分分析(2010年),并通过SVD使用 numpy.linalg.svd .

I'm trying to follow along with Abdi & Williams - Principal Component Analysis (2010) and build principal components through SVD, using numpy.linalg.svd.

当我显示 components_ 属性时从装有sklearn的PCA中,它们的幅度与我手动计算的幅度完全相同,但是某些(不是全部)的符号相反.是什么原因造成的?

When I display the components_ attribute from a fitted PCA with sklearn, they're of the exact same magnitude as the ones that I've manually computed, but some (not all) are of opposite sign. What's causing this?

更新:我下面的(部分)答案包含一些其他信息.

Update: my (partial) answer below contains some additional info.

获取以下示例数据:

from pandas_datareader.data import DataReader as dr
import numpy as np
from sklearn.decomposition import PCA
from sklearn.preprocessing import scale

# sample data - shape (20, 3), each column standardized to N~(0,1)
rates = scale(dr(['DGS5', 'DGS10', 'DGS30'], 'fred', 
           start='2017-01-01', end='2017-02-01').pct_change().dropna())

# with sklearn PCA:
pca = PCA().fit(rates)
print(pca.components_)
[[-0.58365629 -0.58614003 -0.56194768]
 [-0.43328092 -0.36048659  0.82602486]
 [-0.68674084  0.72559581 -0.04356302]]

# compare to the manual method via SVD:
u, s, Vh = np.linalg.svd(np.asmatrix(rates), full_matrices=False)
print(Vh)
[[ 0.58365629  0.58614003  0.56194768]
 [ 0.43328092  0.36048659 -0.82602486]
 [-0.68674084  0.72559581 -0.04356302]]

# odd: some, but not all signs reversed
print(np.isclose(Vh, -1 * pca.components_))
[[ True  True  True]
 [ True  True  True]
 [False False False]]

推荐答案

正如您在答案中指出的那样,就奇异向量而言,奇异值分解(SVD)的结果并非唯一.实际上,如果X的SVD为\ sum_1 ^ r \ s_i u_i v_i ^ \ top:

As you figured out in your answer, the results of a singular value decomposition (SVD) are not unique in terms of singular vectors. Indeed, if the SVD of X is \sum_1^r \s_i u_i v_i^\top :

以s_i递减的顺序排列,那么您会看到可以更改说出u_1和v_1的符号(即翻转"),减号将取消,因此公式仍然成立.

with the s_i ordered in decreasing fashion, then you can see that you can change the sign (i.e., "flip") of say u_1 and v_1, the minus signs will cancel so the formula will still hold.

这表明SVD是唯一的,直到左右奇异矢量对中的符号发生变化为止.

This shows that the SVD is unique up to a change in sign in pairs of left and right singular vectors.

由于PCA只是X的SVD(或X ^ \ top X的特征值分解),因此不能保证每次执行时都不会在同一X上返回不同的结果.可以理解,scikit学习实现希望避免这种情况:通过强制(任意)绝对值u_i的最大系数为正,来保证返回(存储在U和V中)的左右奇异向量始终相同.

Since the PCA is just a SVD of X (or an eigenvalue decomposition of X^\top X), there is no guarantee that it does not return different results on the same X every time it is performed. Understandably, scikit learn implementation wants to avoid this: they guarantee that the left and right singular vectors returned (stored in U and V) are always the same, by imposing (which is arbitrary) that the largest coefficient of u_i in absolute value is positive.

您可以看到阅读 the来源:首先,他们使用linalg.svd()计算U和V.然后,对于每个向量u_i(即U的行),如果其绝对值中的最大元素为正,则它们将不执行任何操作.否则,它们将u_i更改为-u_i,并将相应的左奇异矢量v_i更改为-v_i.如前所述,这不会更改SVD公式,因为减号会被抵消.但是,现在可以保证在此处理后返回的U和V始终相同,因为符号上的不确定性已被删除.

As you can see reading the source: first they compute U and V with linalg.svd(). Then, for each vector u_i (i.e, row of U), if its largest element in absolute value is positive, they don't do anything. Otherwise, they change u_i to - u_i and the corresponding left singular vector, v_i, to - v_i. As told earlier, this does not change the SVD formula since the minus sign cancel out. However, now it is guaranteed that the U and V returned after this processing are always the same, since the indetermination on the sign has been removed.

这篇关于在sklearn.decomposition.PCA中,为什么components_为负数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆