Python scikit 学习 pca.explained_variance_ratio_ cutoff [英] Python scikit learn pca.explained_variance_ratio_ cutoff
问题描述
在选择主成分数 (k) 时,我们选择 k 作为最小值,以便保留 99% 的方差.
When choosing the number of principal components (k), we choose k to be the smallest value so that for example, 99% of variance, is retained.
但是,在 Python Scikit 学习中,我不是 100% 确定 pca.explained_variance_ratio_ = 0.99
等于保留了 99% 的方差"?有谁能开导吗?谢谢.
However, in the Python Scikit learn, I am not 100% sure pca.explained_variance_ratio_ = 0.99
is equal to "99% of variance is retained"? Could anyone enlighten? Thanks.
- Python Scikit 学习 PCA 手册在这里
推荐答案
是的,您几乎是对的.pca.explained_variance_ratio_
参数返回由每个维度解释的方差的向量.因此,pca.explained_variance_ratio_[i]
给出了仅由第 i+1 个维度解释的方差.
Yes, you are nearly right. The pca.explained_variance_ratio_
parameter returns a vector of the variance explained by each dimension. Thus pca.explained_variance_ratio_[i]
gives the variance explained solely by the i+1st dimension.
您可能想要执行 pca.explained_variance_ratio_.cumsum()
.这将返回一个向量 x
使得 x[i]
返回由前 i+1 个维度解释的 累积 方差.
You probably want to do pca.explained_variance_ratio_.cumsum()
. That will return a vector x
such that x[i]
returns the cumulative variance explained by the first i+1 dimensions.
import numpy as np
from sklearn.decomposition import PCA
np.random.seed(0)
my_matrix = np.random.randn(20, 5)
my_model = PCA(n_components=5)
my_model.fit_transform(my_matrix)
print my_model.explained_variance_
print my_model.explained_variance_ratio_
print my_model.explained_variance_ratio_.cumsum()
<小时>
[ 1.50756565 1.29374452 0.97042041 0.61712667 0.31529082]
[ 0.32047581 0.27502207 0.20629036 0.13118776 0.067024 ]
[ 0.32047581 0.59549787 0.80178824 0.932976 1. ]
所以在我的随机玩具数据中,如果我选择 k=4
,我将保留 93.3% 的方差.
So in my random toy data, if I picked k=4
I would retain 93.3% of the variance.
这篇关于Python scikit 学习 pca.explained_variance_ratio_ cutoff的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!