Keras:model.evaluate 与 model.predict 在多类 NLP 任务中的准确性差异 [英] Keras: model.evaluate vs model.predict accuracy difference in multi-class NLP task

查看:44
本文介绍了Keras:model.evaluate 与 model.predict 在多类 NLP 任务中的准确性差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下代码在 keras 中为 NLP 任务训练一个简单的模型.变量名称对于训练、测试和验证集是不言自明的.该数据集有 19 个类,因此网络的最后一层有 19 个输出.标签也是单热编码的.

I am training a simple model in keras for NLP task with following code. Variable names are self explanatory for train, test and validation set. This dataset has 19 classes so final layer of the network has 19 outputs. Labels are also one-hot encoded.

nb_classes = 19
model1 = Sequential()
model1.add(Embedding(nb_words,
                     EMBEDDING_DIM,
                     weights=[embedding_matrix],
                     input_length=MAX_SEQUENCE_LENGTH,
                     trainable=False))
model1.add(LSTM(num_lstm, dropout=rate_drop_lstm, recurrent_dropout=rate_drop_lstm))
model1.add(Dropout(rate_drop_dense))
model1.add(BatchNormalization())
model1.add(Dense(num_dense, activation=act))
model1.add(Dropout(rate_drop_dense))
model1.add(BatchNormalization())

model1.add(Dense(nb_classes, activation = 'sigmoid'))


model1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#One hot encode all labels
ytrain_enc = np_utils.to_categorical(train_labels)
yval_enc = np_utils.to_categorical(val_labels)
ytestenc = np_utils.to_categorical(test_labels)

model1.fit(train_data, ytrain_enc,
             validation_data=(val_data, yval_enc),
             epochs=200,
             batch_size=384,
             shuffle=True,
             verbose=1)

在第一个 epoch 之后,这给了我这些输出.

After first epoch, this gives me these outputs.

Epoch 1/200
216632/216632 [==============================] - 2442s - loss: 0.1427 - acc: 0.9443 - val_loss: 0.0526 - val_acc: 0.9826

然后我在测试数据集上评估我的模型,这也显示我的准确度约为 0.98.

Then I evaluate my model on testing dataset and this also shows me accuracy around 0.98.

model1.evaluate(test_data, y = ytestenc, batch_size=384, verbose=1)

然而,标签是单热编码的,所以我需要类的预测向量,以便我可以生成混淆矩阵等.所以我使用,

However, the labels are one-hot encoded, so I need prediction vector of classes so that I can generate confusion matrix etc. So I use,

PREDICTED_CLASSES = model1.predict_classes(test_data, batch_size=384, verbose=1)
temp = sum(test_labels == PREDICTED_CLASSES)
temp/len(test_labels)
0.83

这表明总预测类别的准确率为 83%,但 model1.evaluate 显示准确率为 98%!我在这里做错了什么?我的损失函数是否适用于分类类标签?我为预测层选择 sigmoid 激活函数好吗?还是 keras 评估模型的方式有所不同?请建议什么可能是错误的.这是我第一次尝试制作深度模型,所以我不太了解这里的问题.

This shows that total predicted classes were 83% accurate however model1.evaluate shows 98% accuracy!! What am I doing wrong here? Is my loss function okay with categorical class labels? Is my choice of sigmoid activation function for prediction layer okay? or there is difference in the way keras evaluates a model? Please suggest on what can be wrong. This is my first try to make a deep model so I don't have much understanding of what's wrong here.

推荐答案

我发现了问题.metrics=['accuracy'] 根据成本函数自动计算准确度.所以使用 binary_crossentropy 显示二进制准确性,而不是分类准确性.使用 categorical_crossentropy 自动切换到分类准确度,现在它与使用 model1.predict() 手动计算相同.Yu-Yang 指出多类问题的成本函数和激活函数是正确的.

I have found the problem. metrics=['accuracy'] calculates accuracy automatically from cost function. So using binary_crossentropy shows binary accuracy, not categorical accuracy. Using categorical_crossentropy automatically switches to categorical accuracy and now it is the same as calculated manually using model1.predict(). Yu-Yang was right to point out the cost function and activation function for multi-class problem.

P.S: 使用 metrics=['binary_accuracy', 'categorical_accuracy']

这篇关于Keras:model.evaluate 与 model.predict 在多类 NLP 任务中的准确性差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆