keras 自编码器与 PCA [英] keras autoencoder vs PCA

查看:38
本文介绍了keras 自编码器与 PCA的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在玩一个玩具示例来理解 PCA 与 keras 自动编码器

I am playing with a toy example to understand PCA vs keras autoencoder

我有以下用于理解 PCA 的代码:

I have the following code for understanding PCA:

import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from sklearn import decomposition
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
pca = decomposition.PCA(n_components=3)
pca.fit(X)

pca.explained_variance_ratio_
array([ 0.92461621,  0.05301557,  0.01718514])

pca.components_
array([[ 0.36158968, -0.08226889,  0.85657211,  0.35884393],
       [ 0.65653988,  0.72971237, -0.1757674 , -0.07470647],
       [-0.58099728,  0.59641809,  0.07252408,  0.54906091]])

我已经用 keras 进行了一些阅读和播放代码,包括这个.

I have done a few readings and play codes with keras including this one.

但是,对于我的理解水平来说,参考代码感觉太高了.

However, the reference code feels too high a leap for my level of understanding.

有人有一个简短的自动编码器代码可以给我看

Does someone have a short auto-encoder code which can show me

(1) 如何从自动编码器中提取前 3 个组件

(1) how to pull the first 3 components from auto-encoder

(2) 如何理解自编码器捕获的方差量

(2) how to understand what amount of variance the auto-encoder captures

(3) 自动编码器组件与 PCA 组件的比较

(3) how the auto-encoder components compare against PCA components

推荐答案

首先,自动编码器的目标是学习一组数据的表示(编码),通常用于降维.因此,自编码器的目标输出是自编码器输入本身.

First of all, the aim of an autoencoder is to learn a representation (encoding) for a set of data, typically for the purpose of dimensionality reduction. So, the target output of the autoencoder is the autoencoder input itself.

在[1]中表明,如果有一个线性隐藏层并且使用均方误差准则来训练网络,那么k个隐藏单元学习将输入投影到数据的前k个主成分的跨度.在 [2] 中,您可以看到,如果隐藏层是非线性的,则自编码器的行为与 PCA 不同,能够捕获输入分布的多模态方面.

It is shown in [1] that If there is one linear hidden layer and the mean squared error criterion is used to train the network, then the k hidden units learn to project the input in the span of the first k principal components of the data. And in [2] you can see that If the hidden layer is nonlinear, the autoencoder behaves differently from PCA, with the ability to capture multi-modal aspects of the input distribution.

自动编码器是特定于数据的,这意味着它们只能压缩与他们接受过训练的数据类似的数据.因此,隐藏层学习到的特征的有用性可用于评估该方法的有效性.

Autoencoders are data-specific, which means that they will only be able to compress data similar to what they have been trained on. So, the usefulness of features that have been learned by hidden layers could be used for evaluating the efficacy of the method.

出于这个原因,评估自动编码器在降维方面的功效的一种方法是削减中间隐藏层的输出,并通过这种减少的数据而不是使用原始数据来比较所需算法的准确性/性能.通常,PCA 是一种线性方法,而自编码器通常是非线性的.从数学上讲,很难将它们进行比较,但从直觉上讲,我提供了一个使用 Autoencoder 对 MNIST 数据集进行降维的示例,以便您更好地理解.代码在这里:

For this reason, one way to evaluate an autoencoder efficacy in dimensionality reduction is cutting the output of the middle hidden layer and compare the accuracy/performance of your desired algorithm by this reduced data rather than using original data. Generally, PCA is a linear method, while autoencoders are usually non-linear. Mathematically, it is hard to compare them together, but intuitively I provide an example of dimensionality reduction on MNIST dataset using Autoencoder for your better understanding. The code is here:

from keras.datasets import mnist 
from keras.models import Model 
from keras.layers import Input, Dense 
from keras.utils import np_utils 
import numpy as np

num_train = 60000
num_test = 10000

height, width, depth = 28, 28, 1 # MNIST images are 28x28
num_classes = 10 # there are 10 classes (1 per digit)

(X_train, y_train), (X_test, y_test) = mnist.load_data()

X_train = X_train.reshape(num_train, height * width)
X_test = X_test.reshape(num_test, height * width)
X_train = X_train.astype('float32') 
X_test = X_test.astype('float32')

X_train /= 255 # Normalise data to [0, 1] range
X_test /= 255 # Normalise data to [0, 1] range

Y_train = np_utils.to_categorical(y_train, num_classes) # One-hot encode the labels
Y_test = np_utils.to_categorical(y_test, num_classes) # One-hot encode the labels

input_img = Input(shape=(height * width,))

x = Dense(height * width, activation='relu')(input_img)

encoded = Dense(height * width//2, activation='relu')(x)
encoded = Dense(height * width//8, activation='relu')(encoded)

y = Dense(height * width//256, activation='relu')(x)

decoded = Dense(height * width//8, activation='relu')(y)
decoded = Dense(height * width//2, activation='relu')(decoded)

z = Dense(height * width, activation='sigmoid')(decoded)
model = Model(input_img, z)

model.compile(optimizer='adadelta', loss='mse') # reporting the accuracy

model.fit(X_train, X_train,
      epochs=10,
      batch_size=128,
      shuffle=True,
      validation_data=(X_test, X_test))

mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)

out = Dense(num_classes, activation='softmax')(y)
reduced = Model(input_img, out)
reduced.compile(loss='categorical_crossentropy',
          optimizer='adam', 
          metrics=['accuracy']) 

reduced.fit(X_train, Y_train,
      epochs=10,
      batch_size=128,
      shuffle=True,
      validation_data=(X_test, Y_test))


 scores = reduced.evaluate(X_test, Y_test, verbose=1) 
 print("Accuracy: ", scores[1])

它产生一个 $yin mathbb{R}^{3}$ (几乎就像你通过 decomposition.PCA(n_components=3) 得到的一样).例如,在这里您可以看到数据集中一个数字 5 实例的 y 层的输出:

It produces a $yin mathbb{R}^{3}$ ( almost like what you get by decomposition.PCA(n_components=3) ). For example, here you see the outputs of layer y for a digit 5 instance in dataset:

  class  y_1    y_2     y_3     
  5      87.38  0.00    20.79

正如你在上面的代码中看到的,当我们将 y 层连接到 softmax 密集层时:

As you see in the above code, when we connect layer y to a softmax dense layer:

mid = Model(input_img, y)
reduced_representation =mid.predict(X_test)

新模型mid为我们提供了95%的良好分类准确率.因此,可以说 y 是数据集的有效提取特征向量.

the new model mid give us a good classification accuracy about 95%. So, it would be reasonable to say that y, is an efficiently extracted feature vector for the dataset.

参考文献:

[1]:布尔拉德、埃尔韦和伊夫·坎普.多层感知器的自动关联和奇异值分解."生物控制论 59.4 (1988): 291-294.

[1]: Bourlard, Hervé, and Yves Kamp. "Auto-association by multilayer perceptrons and singular value decomposition." Biological cybernetics 59.4 (1988): 291-294.

[2]:Japkowicz、Nathalie、Stephen Jose Hanson 和 Mark A. Gluck.非线性自动关联不等同于 PCA."神经计算 12.3 (2000): 531-545.

[2]: Japkowicz, Nathalie, Stephen Jose Hanson, and Mark A. Gluck. "Nonlinear autoassociation is not equivalent to PCA." Neural computation 12.3 (2000): 531-545.

这篇关于keras 自编码器与 PCA的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆