scikit-learn TruncatedSVD的解释方差比不是降序 [英] scikit-learn TruncatedSVD's explained variance ratio not in descending order

查看:403
本文介绍了scikit-learn TruncatedSVD的解释方差比不是降序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与sklearn的PCA不同,TruncatedSVD解释的方差比率不是降序排列.我查看了源代码,似乎它们使用不同的方式来计算解释的方差比:

The TruncatedSVD's explained variance ratio is not in descending order, unlike sklearn's PCA. I looked at the source code and it seems they use different way of calculating the explained variance ratio:

TruncatedSVD :

U, Sigma, VT = randomized_svd(X, self.n_components,
                              n_iter=self.n_iter,
                              random_state=random_state)
X_transformed = np.dot(U, np.diag(Sigma))
self.explained_variance_ = exp_var = np.var(X_transformed, axis=0)
if sp.issparse(X):
    _, full_var = mean_variance_axis(X, axis=0)
    full_var = full_var.sum()
else:
    full_var = np.var(X, axis=0).sum()
self.explained_variance_ratio_ = exp_var / full_var

PCA :

U, S, V = linalg.svd(X, full_matrices=False)
explained_variance_ = (S ** 2) / n_samples
explained_variance_ratio_ = (explained_variance_ /
                             explained_variance_.sum())

PCA使用sigma直接计算explainary_variance,并且由于sigma是降序排列,所以explained_variance也是降序排列.另一方面,TruncatedSVD使用变换后的矩阵的列的方差来计算explained_variance,因此方差不一定按降序排列.

PCA uses sigma to directly calculate the explained_variance and since sigma is in descending order, the explained_variance is also in the descending order. On the other hand, TruncatedSVD uses the variance of the columns of transformed matrix to calculate the explained_variance and therefore the variances are not necessarily in descending order.

这是否意味着我需要先从TruncatedSVD中对explained_variance_ratio进行排序才能找到最重要的k个主成分?

Does this mean that I need to sort the explained_variance_ratio from TruncatedSVD first in order to find the top k principle components?

推荐答案

您不必对explianed_variance_ratio进行排序,输出本身将进行排序,并且仅包含n_component个值.
来自文档:

You dont have to sort explianed_variance_ratio, output itself would be sorted and contains only the n_component number of values.
From Documentation:

TruncatedSVD实现了奇异值分解的变体 (SVD)仅计算最大的奇异值,其中 k 是 用户指定的参数.

TruncatedSVD implements a variant of singular value decomposition (SVD) that only computes the largest singular values, where k is a user-specified parameter.

X_transformed包含仅使用k个分量的分解.

X_transformed contains the decomposition using only k components.

示例可以为您提供一个想法

The example would give you an idea

>>> from sklearn.decomposition import TruncatedSVD
>>> from sklearn.random_projection import sparse_random_matrix
>>> X = sparse_random_matrix(100, 100, density=0.01, random_state=42)
>>> svd = TruncatedSVD(n_components=5, n_iter=7, random_state=42)
>>> svd.fit(X)  
TruncatedSVD(algorithm='randomized', n_components=5, n_iter=7,
        random_state=42, tol=0.0)
>>> print(svd.explained_variance_ratio_)  
[0.0606... 0.0584... 0.0497... 0.0434... 0.0372...]
>>> print(svd.explained_variance_ratio_.sum())  
0.249...
>>> print(svd.singular_values_)  
[2.5841... 2.5245... 2.3201... 2.1753... 2.0443...]

这篇关于scikit-learn TruncatedSVD的解释方差比不是降序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆