使用scikit-learn PCA查找具有最大方差的维度 [英] Finding the dimension with highest variance using scikit-learn PCA

查看:278
本文介绍了使用scikit-learn PCA查找具有最大方差的维度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用pca来识别一组特定数据中具有最高方差的维度.我正在使用scikit-learn的pca来执行此操作,但是我无法从pca方法的输出中识别出方差最大的数据成分是什么.请记住,我不想消除这些尺寸,而只是确定它们.

I need to use pca to identify the dimensions with the highest variance of a certain set of data. I'm using scikit-learn's pca to do it, but I can't identify from the output of the pca method what are the components of my data with the highest variance. Keep in mind that I don't want to eliminate those dimensions, only identify them.

我的数据被组织成一个矩阵,其中包含150行数据,每行具有4个维度.我正在按照以下步骤进行操作:

My data is organized as a matrix with 150 rows of data, each one with 4 dimensions. I'm doing as follow:

pca = sklearn.decomposition.PCA()
pca.fit(data_matrix)

当我打印 pca.explained_variance_ratio _ 时,它会输出从最高到最低排序的方差比数组,但它不会告诉我它们对应的数据是哪个维度(我已经尝试过更改矩阵矩阵上列的顺序,并且得到的方差比数组相同).

When I print pca.explained_variance_ratio_, it outputs an array of variance ratios ordered from highest to lowest, but it doesn't tell me which dimension from the data they correspond to (I've tried changing the order of columns on my matrix, and the resulting variance ratio array was the same).

打印 pca.components _ 给了我一个4x4矩阵(我保留了组件的原始数量作为pca的参数),并带有一些值,根据scikit的文档,我无法理解...的含义. ,它们应该是方差最大的分量(也许是特征向量?),但没有迹象表明这些值所指向的维度.

Printing pca.components_ gives me a 4x4 matrix (I left the original number of components as argument to pca) with some values I can't understand the meaning of...according to scikit's documentation, they should be the components with the maximum variance (the eigenvectors perhaps?), but no sign of which dimension those values refer to.

转换数据也无济于事,因为更改尺寸的方式我无法真正知道它们原来是哪个.

Transforming the data doesn't help either, because the dimensions are changed in a way I can't really know which one they were originally.

有什么办法可以通过scikit的pca获取此信息?谢谢

Is there any way I can get this information with scikit's pca? Thanks

推荐答案

返回的pca.explained_variance_ratio_是与主成分的差异.您可以使用它们来查找pca可以更好地转换数据的维数(组件).您可以为此设置阈值(例如,计算出多少个方差大于0.5等).之后,您可以使用等于等于大于使用阈值的主成分的维数(成分),通过PCA转换数据.缩小到这些维的数据与原始数据中的维数据不同.

The pca.explained_variance_ratio_ returned are the variances from principal components. You can use them to find how many dimensions (components) your data could be better transformed by pca. You can use a threshold for that (e.g, you count how many variances are greater than 0.5, among others). After that, you can transform the data by PCA using the number of dimensions (components) that are equal to principal components higher than the threshold used. The data reduced to these dimensions are different from the data on dimensions in original data.

您可以通过以下链接检查代码:

you can check the code from this link:

http://scikit-learn.org/dev/tutorial/statistical_inference/unsupervised_learning.html#principal-component-analysis-pca

这篇关于使用scikit-learn PCA查找具有最大方差的维度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆