使用sklearn提取PCA组件 [英] Extracting PCA components with sklearn

查看:162
本文介绍了使用sklearn提取PCA组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 sklearn的PCA 来获取维度减少大量图像.安装PCA后,我想看看这些组件的外观.

I am using sklearn's PCA for dimensionality reduction on a large set of images. Once the PCA is fitted, I would like to see what the components look like.

可以通过查看components_属性来实现.没意识到那是可用的,我做了其他事情:

One can do so by looking at the components_ attribute. Not realizing that was available, I did something else instead:

each_component = np.eye(total_components)
component_im_array = pca.inverse_transform(each_component)

for i in range(num_components):
   component_im = component_im_array[i, :].reshape(height, width)
   # do something with component_im

换句话说,我在PCA空间中创建了一个图像,该图像中除了1以外的所有特征均设置为0.通过对它们进行逆变换,我应该在原始空间中获得图像,一旦变换,就可以用该PCA组件.

In other words, I create an image in the PCA space that has all features but 1 set to 0. By inversely transforming them, I should then get the image in the original space which, once transformed, can be expressed solely with that PCA component.

下图显示了结果.左边是使用我的方法计算的分量.右边是直接pca.components_[i].另外,使用我的方法,大多数图像非常相似(但它们 不同),而通过访问components_图像则与我期望的非常不同

The following image shows the results. On the left is the component calculated using my method. On the right is pca.components_[i] directly. Additionally, with my method, most images are very similar (but they are different) while by accessing the components_ the images are very different as I would have expected

我的方法中是否存在概念上的问题?显然,pca.components_[i]中的组件比我得到的组件更正确(或者至少更正确).谢谢!

Is there a conceptual problem in my method? Clearly the components from pca.components_[i] are correct (or at least more correct) than the ones I'm getting. Thanks!

推荐答案

在身份矩阵上获取components_和执行inverse_transform之间的区别在于,后者将每个特征的经验均值相加.即:

The difference between grabbing the components_ and doing an inverse_transform on the identity matrix is that the latter adds in the empirical mean of each feature. I.e.:

def inverse_transform(self, X):
    return np.dot(X, self.components_) + self.mean_

其中self.mean_是从训练集中估计的.

where self.mean_ was estimated from the training set.

这篇关于使用sklearn提取PCA组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆