如何使用 PCA 和 scikit-learn 进行标准化 [英] How to normalize with PCA and scikit-learn

查看:63
本文介绍了如何使用 PCA 和 scikit-learn 进行标准化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我保持简短.基本上我想知道的是:我应该这样做吗,

Let me keep this brief. Basically what I want to know is: should I do this,

pca.fit(normalize(x))
new=pca.transform(normalize(x))

或者这个

pca.fit(normalize(x))
new=pca.transform(x)

我知道在使用 PCA 之前我们应该规范化我们的数据,但是 sklearn 上面的哪一个程序是正确的?

I know that we should normalize our data before using PCA but which one of the procedures above is correct with sklearn?

推荐答案

通常,您会希望使用第一个选项.

In general, you would want to use the first option.

您的规范化将您的数据放置在 PCA 看到的新空间中,并且其变换基本上期望数据位于同一空间中.

Your normalization places your data in a new space which is seen by the PCA and its transform basically expects the data to be in the same space.

Scikit-learn 提供了工具,通过在管道中连接估计器来透明且方便地执行此操作.试试:

Scikit-learn provides tools to do this transparently and conveniently by concatenating estimators in a pipeline. Try:

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

import numpy as np

data = np.random.randn(20, 40)

pipeline = Pipeline([('scaling', StandardScaler()), ('pca', PCA(n_components=5))])

pipeline.fit_transform(data)

前置缩放器将始终在数据进入 PCA 对象之前将其转换应用于数据.

The prepended scaler will then always apply its transformation to the data before it goes to the PCA object.

正如@larsmans 指出的那样,您可能希望使用 sklearn.preprocessing.Normalizer 而不是 StandardScaler,或者类似地,从 中删除均值居中StandardScaler 通过传递关键字参数 with_mean=False.

As @larsmans points out, you may want to use sklearn.preprocessing.Normalizer instead of the StandardScaler or, similarly, remove the mean centering from the StandardScaler by passing the keyword argument with_mean=False.

这篇关于如何使用 PCA 和 scikit-learn 进行标准化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆