Keras - 自动编码器精度为零 [英] Keras - Autoencoder accuracy stuck on zero

查看:27
本文介绍了Keras - 自动编码器精度为零的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用自动编码器和 Keras 检测欺诈.我已将以下代码编写为 Notebook:

I'm trying to detect fraud using autoencoder and Keras. I've written the following code as a Notebook:

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
from sklearn.preprocessing import StandardScaler
from keras.layers import Input, Dense
from keras.models import Model
import matplotlib.pyplot as plt

data = pd.read_csv('../input/creditcard.csv')
data['normAmount'] = StandardScaler().fit_transform(data['Amount'].values.reshape(-1, 1))
data = data.drop(['Time','Amount'],axis=1)

data = data[data.Class != 1]
X = data.loc[:, data.columns != 'Class']

encodingDim = 7
inputShape = X.shape[1]
inputData = Input(shape=(inputShape,))

X = X.as_matrix()

encoded = Dense(encodingDim, activation='relu')(inputData)
decoded = Dense(inputShape, activation='sigmoid')(encoded)
autoencoder = Model(inputData, decoded)
encoder = Model(inputData, encoded)
encodedInput = Input(shape=(encodingDim,))
decoderLayer = autoencoder.layers[-1]
decoder = Model(encodedInput, decoderLayer(encodedInput))

autoencoder.summary()

autoencoder.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
history = autoencoder.fit(X, X,
                epochs=10,
                batch_size=256,
                validation_split=0.33)

print(history.history.keys())
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'test'], loc='upper left')
plt.show()

我可能遗漏了一些东西,我的准确率一直停留在 0 上,而且我的测试损失低于我的火车损失.

I'm probably missing something, my accuracy is stuck on 0 and my test loss is lower than my train loss.

任何见解都会受到欢迎

推荐答案

自动编码器的准确度没有多大意义,尤其是在欺诈检测算法上.我的意思是在回归任务上没有很好地定义准确性.例如,说 0.1 与 0.11 相同是否准确.对于 keras 算法,它不是.如果您想了解您的算法复制数据的效果,我建议您查看 MSE 或数据本身.许多自编码器使用 MSE 作为它们的损失函数.

Accuracy on an autoencoder has little meaning, especially on a fraud detection algorithm. What I mean by this is that accuracy is not well defined on regression tasks. For example is it accurate to say that 0.1 is the same as 0.11. For the keras algorithm it is not. If you want to see how well your algorithm replicates the data I would suggest looking at the MSE or at the data itself. Many autoencoder use MSE as their loss function.

您应该监控的指标是良好示例的训练损失与欺诈示例的验证损失.在那里,您可以轻松查看是否可以比欺诈样本更接近真实示例,以及您的算法在实践中的表现如何.

The metric you should be monitoring is the training loss on good examples vs the validation loss on fraudulent examples. There you can easily see if you can fit your real examples more closely than the fraudulent ones and how well your algorithm performs in practice.

我不会做的另一个设计选择是自动编码器中的 relu.ReLU 与更深的模型配合得很好,因为它在对抗消失/爆炸梯度方面的简单性和有效性.然而,这两个问题在自动编码器中都不是因素,数据丢失在自动编码器中会造成伤害.我建议使用 tanh 作为中间激活函数.

Another design choice I would not make is relu in an autoencoder. ReLU works well with deeper model because of its simplicity and effectiveness in combating vanishing/exploding gradients. However, both of this concerns are non-factors in autoencoder and the loss of data hurts in an autoencoder. I would suggest using tanh as your intermediate activation function.

这篇关于Keras - 自动编码器精度为零的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆