拆分数据时使用scikit-learn标准化PCA [英] Normalize PCA with scikit-learn when data is split

查看：159 发布时间：2020/7/31 4:14:19 python scikit-learn pca

本文介绍了拆分数据时使用scikit-learn标准化PCA的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个后续问题:如何通过PCA和scikit学习.

我正在创建一个情绪检测系统，我现在要做的是:

I'm creating an emotion detection system and what I do now is:

将数据分散在所有情感上(将数据分布在多个子集上).
将所有数据加在一起(将多个子集分成1组)
获取组合数据的PCA参数(self.pca = RandomizedPCA(n_components = self.n_components，whiten = True).fit(self.data))
每个情感(每个子集)，将PCA应用于该情感(子集)的数据.

我应该在以下步骤进行归一化:步骤2)对所有组合数据进行归一化，步骤4)对子集进行归一化.

I should do the normalization at: step 2) Normalize all combined data, and step 4) normalize the subsets.

我想知道所有数据的归一化和子集的归一化是否相同.现在，当我尝试根据@BartoszKP的建议简化我的示例时，我发现我如何理解标准化的工作是错误的.两种情况下的规范化都以相同的方式工作，因此这是一种有效的方法，对吗? (请参见代码)

I was wondering if the normalization over all data and the normalization over subset is the same. Now when I tried to simplify my example on suggestion of @BartoszKP I figured out that how I understood the normalization worked, was wrong. The normalization in both cases work in the same way, so this is a valid way to do it, right? (see code)

from sklearn.preprocessing import normalize
from sklearn.decomposition import RandomizedPCA
import numpy as np

data_1 = np.array(([52, 254], [4, 128]), dtype='f')
data_2 = np.array(([39, 213], [123, 7]), dtype='f')
data_combined = np.vstack((data_1, data_2))
#print(data_combined)
"""
Output
[[  52.  254.]
 [   4.  128.]
 [  39.  213.]
 [ 123.    7.]]
"""
#Normalize all data
data_norm = normalize(data_combined)
print(data_norm)
"""
[[ 0.20056452  0.97968054]
 [ 0.03123475  0.99951208]
 [ 0.18010448  0.98364753]
 [ 0.99838448  0.05681863]]
"""

pca = RandomizedPCA(n_components=20, whiten=True)
pca.fit(data_norm)

#Normalize subset of data
data_1_norm = normalize(data_1)
print(data_1_norm)
"""
[[ 0.20056452  0.97968054]
 [ 0.03123475  0.99951208]]
"""
pca.transform(data_1_norm)

拆分数据时使用scikit-learn标准化PCA [英] Normalize PCA with scikit-learn when data is split

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

拆分数据时使用scikit-learn标准化PCA [英] Normalize PCA with scikit-learn when data is split

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭