numpy和sklearn上PCA,truncated_svd和svds的结果不同 [英] different results for PCA, truncated_svd and svds on numpy and sklearn

查看:450
本文介绍了numpy和sklearn上PCA,truncated_svd和svds的结果不同的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在numpy中,有多种方法可以计算第一个主成分. 对于每种方法,我得到不同的结果.为什么?

In sklearn an numpy there are different ways to compute the first principal component. I obtain a different results for each method. Why?

import matplotlib.pyplot as pl
from sklearn import decomposition
import scipy as sp
import sklearn.preprocessing
import numpy as np
import sklearn as sk

def gen_data_3_1():
    #### generate the data 3.1
    m=1000 # number of samples
    n=10 # number of variables
    d1=np.random.normal(loc=0,scale=100,size=(m,1))
    d2=np.random.normal(loc=0,scale=121,size=(m,1))
    d3=-0.2*d1+0.9*d2
    z=np.zeros(shape=(m,1))

    for i in range(4):
        z=np.hstack([z,d1+np.random.normal(size=(m,1))])

    for i in range(4):
        z=np.hstack([z,d2+np.random.normal(size=(m,1))])
    for i in range(2):
        z=np.hstack([z,d3+np.random.normal(size=(m,1))])
    z=z[:,1:11]  
    z=sk.preprocessing.scale(z,axis=0)
    return z

x=gen_data_3_1() #generate the sample dataset

x=sk.preprocessing.scale(x) #normalize the data
pca=sk.decomposition.PCA().fit(x) #compute the PCA of x and print the first princ comp.
print "first pca components=",pca.components_[:,0]
u,s,v=sp.sparse.linalg.svds(x) # the first column of v.T is the first princ comp
print "first svd components=",v.T[:,0]

trsvd=sk.decomposition.TruncatedSVD(n_components=3).fit(x) #the first components is the                          
                                                           #first princ comp
print "first component TruncatedSVD=",trsvd.components_[0,]

-

   first pca components= [-0.04201262  0.49555992  0.53885401 -0.67007959  0.0217131  -0.02535204
      0.03105254 -0.07313795 -0.07640555 -0.00442718]
    first svd components= [ 0.02535204 -0.1317925   0.12071112 -0.0323422   0.20165568 -0.25104996
     -0.0278177   0.17856688 -0.69344318  0.59089451]
    first component TruncatedSVD= [-0.04201262 -0.04230353 -0.04213402 -0.04221069  0.4058159   0.40584108
      0.40581564  0.40584842  0.40872029  0.40870925]

推荐答案

因为PCA,SVD和截断的SVD方法不同. PCA调用SVD,但它之前也将数据居中.截断的SVD会截断向量. svdssvd不同,因为它稀疏.

Because the methods PCA, SVD, and truncated SVD are not the same. PCA calls SVD, but it also centers data before. Truncated SVD truncates the vectors. svds is a different method from svd as it is sparse.

这篇关于numpy和sklearn上PCA,truncated_svd和svds的结果不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆