带有matplotlib的PCA的基本示例 [英] Basic example for PCA with matplotlib

查看:391
本文介绍了带有matplotlib的PCA的基本示例的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用matplotlib.mlab.PCA做一个简单的主成分分析,但是由于有了类的属性,我无法得到一个解决问题的干净方法.这是一个示例:

I trying to do a simple principal component analysis with matplotlib.mlab.PCA but with the attributes of the class I can't get a clean solution to my problem. Here's an example:

以2D方式获取一些虚拟数据并启动PCA:

Get some dummy data in 2D and start PCA:

from matplotlib.mlab import PCA
import numpy as np

N     = 1000
xTrue = np.linspace(0,1000,N)
yTrue = 3*xTrue

xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data  = np.hstack((xData, yData))
test2PCA = PCA(data)

现在,我只想在原始坐标中将主成分作为矢量获取,并以箭头的形式将其绘制到我的数据上.

Now, I just want to get the principal components as vectors in my original coordinates and plot them as arrows onto my data.

什么是到达目的地的快速而干净的方法?

What is a quick and clean way to get there?

谢谢,泰拉克斯

推荐答案

我认为mlab.PCA类不适合您要执行的操作.特别地,PCA类在查找特征向量之前会重新缩放数据:

I don't think the mlab.PCA class is appropriate for what you want to do. In particular, the PCA class rescales the data before finding the eigenvectors:

a = self.center(a)
U, s, Vh = np.linalg.svd(a, full_matrices=False)

center方法除以sigma:

def center(self, x):
    'center the data using the mean and sigma from training set a'
    return (x - self.mu)/self.sigma

这将生成特征向量pca.Wt,如下所示:

This results in eigenvectors, pca.Wt, like this:

[[-0.70710678 -0.70710678]
 [-0.70710678  0.70710678]]

它们是垂直的,但与原始数据的主轴不直接相关.它们是相对于按摩数据的主轴.

They are perpendicular, but not directly relevant to the principal axes of your original data. They are principal axes with respect to massaged data.

也许直接编写想要的代码(不使用mlab.PCA类)可能会更容易:

Perhaps it might be easier to code what you want directly (without the use of the mlab.PCA class):

import numpy as np
import matplotlib.pyplot as plt

N = 1000
xTrue = np.linspace(0, 1000, N)
yTrue = 3 * xTrue
xData = xTrue + np.random.normal(0, 100, N)
yData = yTrue + np.random.normal(0, 100, N)
xData = np.reshape(xData, (N, 1))
yData = np.reshape(yData, (N, 1))
data = np.hstack((xData, yData))

mu = data.mean(axis=0)
data = data - mu
# data = (data - mu)/data.std(axis=0)  # Uncommenting this reproduces mlab.PCA results
eigenvectors, eigenvalues, V = np.linalg.svd(data.T, full_matrices=False)
projected_data = np.dot(data, eigenvectors)
sigma = projected_data.std(axis=0).mean()
print(eigenvectors)

fig, ax = plt.subplots()
ax.scatter(xData, yData)
for axis in eigenvectors:
    start, end = mu, mu + sigma * axis
    ax.annotate(
        '', xy=end, xycoords='data',
        xytext=start, textcoords='data',
        arrowprops=dict(facecolor='red', width=2.0))
ax.set_aspect('equal')
plt.show()

这篇关于带有matplotlib的PCA的基本示例的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆