使用 scikit-learn PCA 找到方差最大的维度 [英] Finding the dimension with highest variance using scikit-learn PCA

查看:38
本文介绍了使用 scikit-learn PCA 找到方差最大的维度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要使用 pca 来识别一组数据中方差最大的维度.我正在使用 scikit-learn 的 pca 来做这件事,但我无法从 pca 方法的输出中识别出我的数据中具有最高方差的组件是什么.请记住,我不想消除这些维度,只想识别它们.

I need to use pca to identify the dimensions with the highest variance of a certain set of data. I'm using scikit-learn's pca to do it, but I can't identify from the output of the pca method what are the components of my data with the highest variance. Keep in mind that I don't want to eliminate those dimensions, only identify them.

我的数据被组织成一个矩阵,其中包含 150 行数据,每行有 4 个维度.我正在做如下:

My data is organized as a matrix with 150 rows of data, each one with 4 dimensions. I'm doing as follow:

pca = sklearn.decomposition.PCA()
pca.fit(data_matrix)

当我打印 pca.explained_variance_ratio_ 时,它输出一个从高到低排序的方差比数组,但它没有告诉我它们对应的数据中的哪个维度(我试过更改矩阵上的列顺序,结果方差比数组相同).

When I print pca.explained_variance_ratio_, it outputs an array of variance ratios ordered from highest to lowest, but it doesn't tell me which dimension from the data they correspond to (I've tried changing the order of columns on my matrix, and the resulting variance ratio array was the same).

打印 pca.components_ 给了我一个 4x4 矩阵(我将组件的原始数量作为 pca 的参数),其中包含一些我无法理解的值...根据 scikit 的文档,它们应该是具有最大方差的分量(也许是特征向量?),但没有迹象表明这些值指的是哪个维度.

Printing pca.components_ gives me a 4x4 matrix (I left the original number of components as argument to pca) with some values I can't understand the meaning of...according to scikit's documentation, they should be the components with the maximum variance (the eigenvectors perhaps?), but no sign of which dimension those values refer to.

转换数据也无济于事,因为维度发生了变化,我无法真正知道它们最初是哪一个.

Transforming the data doesn't help either, because the dimensions are changed in a way I can't really know which one they were originally.

有什么办法可以通过 scikit 的 pca 获取这些信息?谢谢

Is there any way I can get this information with scikit's pca? Thanks

推荐答案

返回的 pca.explained_variance_ratio_ 是主成分的方差.您可以使用它们来查找 pca 可以更好地转换数据的维度(组件).您可以为此使用阈值(例如,您计算有多少方差大于 0.5,等等).之后,您可以使用等于高于所用阈值的主成分的维数(成分)数通过 PCA 转换数据.缩减到这些维度的数据与原始数据中维度的数据不同.

The pca.explained_variance_ratio_ returned are the variances from principal components. You can use them to find how many dimensions (components) your data could be better transformed by pca. You can use a threshold for that (e.g, you count how many variances are greater than 0.5, among others). After that, you can transform the data by PCA using the number of dimensions (components) that are equal to principal components higher than the threshold used. The data reduced to these dimensions are different from the data on dimensions in original data.

您可以通过此链接查看代码:

you can check the code from this link:

http://scikit-learn.org/dev/tutorial/statistical_inference/unsupervised_learning.html#principal-component-analysis-pca

这篇关于使用 scikit-learn PCA 找到方差最大的维度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆