如何使用scikit-learn PCA进行功能简化并知道哪些功能被丢弃 [英] How to use scikit-learn PCA for features reduction and know which features are discarded

查看:105
本文介绍了如何使用scikit-learn PCA进行功能简化并知道哪些功能被丢弃的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在尺寸为m x n的矩阵上运行PCA,其中m是特征数量,n是样本数量.

I am trying to run a PCA on a matrix of dimensions m x n where m is the number of features and n the number of samples.

假设我要保留最大方差的nf功能.使用scikit-learn,我可以通过以下方式做到这一点:

Suppose I want to preserve the nf features with the maximum variance. With scikit-learn I am able to do it in this way:

from sklearn.decomposition import PCA

nf = 100
pca = PCA(n_components=nf)
# X is the matrix transposed (n samples on the rows, m features on the columns)
pca.fit(X)

X_new = pca.transform(X)

现在,我得到一个形状为n x nf的新矩阵X_new.是否可以知道哪些功能已被丢弃或保留了?

Now, I get a new matrix X_new that has a shape of n x nf. Is it possible to know which features have been discarded or the retained ones?

谢谢

推荐答案

PCA对象在拟合过程中确定的功能位于pca.components_中.正交于pca.components_的向量空间将被丢弃.

The features that your PCA object has determined during fitting are in pca.components_. The vector space orthogonal to the one spanned by pca.components_ is discarded.

请注意,PCA不会丢弃"或保留"您的任何预定义功能(由您指定的列编码).它通过加权求和将它们全部混合在一起,以找到最大方差的正交方向.

Please note that PCA does not "discard" or "retain" any of your pre-defined features (encoded by the columns you specify). It mixes all of them (by weighted sums) to find orthogonal directions of maximum variance.

如果这不是您要寻找的行为,那么降低PCA尺寸是不可行的.对于一些简单的常规功能选择方法,您可以看一下sklearn.feature_selection

If this is not the behaviour you are looking for, then PCA dimensionality reduction is not the way to go. For some simple general feature selection methods, you can take a look at sklearn.feature_selection

这篇关于如何使用scikit-learn PCA进行功能简化并知道哪些功能被丢弃的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆