Keras:模型.评估与模型.预测多类NLP任务中的准确性差异 [英] Keras: model.evaluate vs model.predict accuracy difference in multi-class NLP task

查看:46
本文介绍了Keras:模型.评估与模型.预测多类NLP任务中的准确性差异的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用以下代码在keras中为NLP任务训练一个简单的模型.变量名称是火车,测试和验证集的自解释性.该数据集具有19个类,因此网络的最后一层具有19个输出.标签也是一键编码的.

I am training a simple model in keras for NLP task with following code. Variable names are self explanatory for train, test and validation set. This dataset has 19 classes so final layer of the network has 19 outputs. Labels are also one-hot encoded.

nb_classes = 19
model1 = Sequential()
model1.add(Embedding(nb_words,
                     EMBEDDING_DIM,
                     weights=[embedding_matrix],
                     input_length=MAX_SEQUENCE_LENGTH,
                     trainable=False))
model1.add(LSTM(num_lstm, dropout=rate_drop_lstm, recurrent_dropout=rate_drop_lstm))
model1.add(Dropout(rate_drop_dense))
model1.add(BatchNormalization())
model1.add(Dense(num_dense, activation=act))
model1.add(Dropout(rate_drop_dense))
model1.add(BatchNormalization())

model1.add(Dense(nb_classes, activation = 'sigmoid'))


model1.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
#One hot encode all labels
ytrain_enc = np_utils.to_categorical(train_labels)
yval_enc = np_utils.to_categorical(val_labels)
ytestenc = np_utils.to_categorical(test_labels)

model1.fit(train_data, ytrain_enc,
             validation_data=(val_data, yval_enc),
             epochs=200,
             batch_size=384,
             shuffle=True,
             verbose=1)

在第一个时期之后,这给了我这些输出.

After first epoch, this gives me these outputs.

Epoch 1/200
216632/216632 [==============================] - 2442s - loss: 0.1427 - acc: 0.9443 - val_loss: 0.0526 - val_acc: 0.9826

然后我在测试数据集上评估我的模型,这也向我显示了0.98左右的准确性.

Then I evaluate my model on testing dataset and this also shows me accuracy around 0.98.

model1.evaluate(test_data, y = ytestenc, batch_size=384, verbose=1)

但是,标签是一键编码的,因此我需要类的预测矢量,以便可以生成混淆矩阵等.所以我使用

However, the labels are one-hot encoded, so I need prediction vector of classes so that I can generate confusion matrix etc. So I use,

PREDICTED_CLASSES = model1.predict_classes(test_data, batch_size=384, verbose=1)
temp = sum(test_labels == PREDICTED_CLASSES)
temp/len(test_labels)
0.83

这表明预测的总类的准确性为83%,但是model1.evaluate显示的准确性为98%!我在这里做错了什么?我的损失函数可以与分类类标签一起使用吗?我为预测层选择的sigmoid激活函数可以吗?还是keras评估模型的方式不同?请提出可能出问题的建议.这是我第一次尝试建立深层模型,因此我对这里的问题不太了解.

This shows that total predicted classes were 83% accurate however model1.evaluate shows 98% accuracy!! What am I doing wrong here? Is my loss function okay with categorical class labels? Is my choice of sigmoid activation function for prediction layer okay? or there is difference in the way keras evaluates a model? Please suggest on what can be wrong. This is my first try to make a deep model so I don't have much understanding of what's wrong here.

推荐答案

我发现了问题. metrics=['accuracy']根据成本函数自动计算准确性.因此,使用binary_crossentropy显示的是二进制精度,而不是分类精度.使用categorical_crossentropy会自动切换到分类精度,现在与使用model1.predict()手动计算的精度相同.于洋指出了多类问题的成本函数和激活函数是正确的.

I have found the problem. metrics=['accuracy'] calculates accuracy automatically from cost function. So using binary_crossentropy shows binary accuracy, not categorical accuracy. Using categorical_crossentropy automatically switches to categorical accuracy and now it is the same as calculated manually using model1.predict(). Yu-Yang was right to point out the cost function and activation function for multi-class problem.

P.S:使用metrics=['binary_accuracy', 'categorical_accuracy']

这篇关于Keras:模型.评估与模型.预测多类NLP任务中的准确性差异的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆