将pca应用于测试数据 [英] Apply pca to the test data

查看:201
本文介绍了将pca应用于测试数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用sklearn执行PCA的python实现.我创建了以下功能:

I am trying to perform the python implementation of PCA using sklearn. I have created the following function:

def dimensionality_reduction(train_dataset_mod1, train_dataset_mod2, test_dataset_mod1, test_dataset_mod2):

  pca = PCA(n_components= 200)
  pca.fit(train_dataset_mod1.transpose())
  mod1_features_train = pca.components_
  pca2 = PCA(n_components=200)
  pca2.fit(train_dataset_mod2.transpose())
  mod2_features_train = pca2.components_
  mod1_features_test = pca.transform(test_dataset_mod1)
  mod2_features_test = pca2.transform(test_dataset_mod2)

  return mod1_features_train.transpose(), mod2_features_train.transpose(), mod1_features_test, mod2_features_test

我的矩阵大小如下:

train_dataset_mod1 733x5000
test_dataset_mod1 360x5000
mod1_features_train 200x733
train_dataset_mod2 733x8000
test_dataset_mod2 360x8000
mod2_features_train 200x733

train_dataset_mod1 733x5000
test_dataset_mod1 360x5000
mod1_features_train 200x733
train_dataset_mod2 733x8000
test_dataset_mod2 360x8000
mod2_features_train 200x733

但是,当我尝试运行整个脚本时,会收到以下消息:

However when I am trying to run the whole script I am receiving the following message:

在转换中的文件"\ Anaconda2 \ lib \ site-packages \ sklearn \ decomposition \ base.py",第132行 X = X-self.mean _

File "\Anaconda2\lib\site-packages\sklearn\decomposition\base.py", line 132, in transform X = X - self.mean_

出了什么问题?如何将pca应用于测试数据?

What is the issue? How can I apply the pca to the test data?

下面是为mod1调试pca的示例:

Here an example of the debugging of pca for mod1:

转换后的数据集mod1_features_train和mod1_features_train的正确大小均为500x733.但是我不能对test_dataset_mod1和test_dataset_mod2做同样的事情,为什么?

The transformed dataset mod1_features_train and mod1_features_train having the correct size both 500x733. However I cannot do the same with test_dataset_mod1 and test_dataset_mod2, why?

在调试过程中,我注意到pca的base.py文件中有一个运算X = X-self.mean,其中X是我的测试数据,self_mean是从适合火车组(slf_mean的大小为733,与X不匹配).如果我在训练过程中删除了transpose(),则pca正常运行而没有错误,则test_dataset_mod1和test_dataset_mod2的大小正确为360x500,但是,train_dataset_mod1和train_dataset_mod2的大小错误为5000x500 ??

During the debugging I noticed that the base.py file of pca, there is an operation X = X - self.mean where X is my test data and self_mean the mean calculated from the fit into the train set (the size of the slf_mean is 733 which does not match with the X). If i remove the transpose() in the training process the pca is working normally without errors, the test_dataset_mod1 and test_dataset_mod2 having correct size 360x500, however, the train_dataset_mod1 and train_dataset_mod2 having wrong sizes 5000x500???

推荐答案

您不应该在fit函数中转置矩阵,或者如果必须这样做,则必须在transform函数中转置矩阵:

you shouldn't have transpose your matrix in in fit function or if you have to , you have to transpose your matrix in the transform function :

pca.fit(train_dataset_mod1)
  pca2.fit(train_dataset_mod2)
  mod1_features_test = pca.transform(test_dataset_mod1)
  mod2_features_test = pca2.transform(test_dataset_mod2)

或:

pca.fit(train_dataset_mod1.transpose())
  pca2.fit(train_dataset_mod2.transpose())
  mod1_features_test = pca.transform(test_dataset_mod1.transpose())
  mod2_features_test = pca2.transform(test_dataset_mod2.transpose())

这篇关于将pca应用于测试数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆