多类CNN的宏指标(调用/F1 ...) [英] Macro metrics (recall/F1...) for multiclass CNN

查看:88
本文介绍了多类CNN的宏指标(调用/F1 ...)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用CNN对不平衡数据集进行图像分类.我对tensorflow后端完全陌生.这是多类问题(不是multilabel),我有16个类.类是一种热编码.

I use CNN for image classification on unbalance dataset. I'm totaly new with tensorflow backend. It's multiclass problem (not multilabel) and I have 16 classes. Class are one hot encoded.

我想为每个时期计算MACRO指标:F1,精度和召回率.

I want to compute MACRO metrics for each epoch: F1, precision and recall.

我找到了打印这些宏指标的代码,但仅适用于验证集 来自: https://medium.com/@thongonary/how-to-compute-f1-score-for-each-epoch-in-keras-a1acd17715a2

I found a code to print those Macro metrics but it's only work on validation set From: https://medium.com/@thongonary/how-to-compute-f1-score-for-each-epoch-in-keras-a1acd17715a2

class Metrics(Callback):

 def on_train_begin(self, logs={}):
  self.val_f1s = []
  self.val_recalls = []
  self.val_precisions = []

 def on_epoch_end(self, epoch, logs={}):
  val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()
  val_targ = self.validation_data[1]
  _val_f1 = f1_score(val_targ, val_predict,average='macro')
  _val_recall = recall_score(val_targ, val_predict,average='macro')
  _val_precision = precision_score(val_targ, val_predict,average='macro')
  self.val_f1s.append(_val_f1)
  self.val_recalls.append(_val_recall)
  self.val_precisions.append(_val_precision)
  print (" — val_f1: %f — val_precision: %f — val_recall %f" % (_val_f1, _val_precision, _val_recall))
  return

metrics = Metrics()

由于我们使用

 val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()

在进行多类分类时,ROUND会导致错误吗?

我使用此代码在训练集上打印指标(仅回想一下,因为这对我来说是重要的指标)(由于在model.compute中使用了验证集,因此也要进行计算) 代码已改编自:用于在keras中调用的自定义宏

And I use this code to print the metrics (only recall since that the important metrics for me) on the training set (also compute on validation set since it's used in model.compute) code has been adapted from: Custom macro for recall in keras



def recall(y_true,y_pred):
     true_positives = K.sum(K.round(K.clip(y_true * y_pred, 0, 1)))
     possible_positives = K.sum(K.round(K.clip(y_true, 0, 1)))
     return  true_positives / (possible_positives + K.epsilon())

def unweightedRecall(y_true, y_pred):
    return (recall(y_true[:,0],y_pred[:,0]) + recall(y_true[:,1],y_pred[:,1])+recall(y_true[:,2],y_pred[:,2]) + recall(y_true[:,3],y_pred[:,3])
            +recall(y_true[:,4],y_pred[:,4]) + recall(y_true[:,5],y_pred[:,5])
            +recall(y_true[:,6],y_pred[:,6]) + recall(y_true[:,7],y_pred[:,7])
            +recall(y_true[:,8],y_pred[:,8]) + recall(y_true[:,9],y_pred[:,9])
            +recall(y_true[:,10],y_pred[:,10]) + recall(y_true[:,11],y_pred[:,11])
            +recall(y_true[:,12],y_pred[:,12]) + recall(y_true[:,13],y_pred[:,13])
            +recall(y_true[:,14],y_pred[:,14]) + recall(y_true[:,15],y_pred[:,15]))/16.    



我通过以下方式运行模型

I run my model with

model.compile(optimizer="adam", loss="categorical_crossentropy",metrics=[unweightedRecall,"accuracy"])   #model compilation with unweightedRecall metrics

train =model.fit_generator(image_gen.flow(train_X, train_label, batch_size=64),epochs=100,verbose=1,validation_data=(valid_X, valid_label),class_weight=class_weights,callbacks=[metrics],steps_per_epoch=len(train_X)/64)  #run the model

VALIDATION宏调用与2个不同的代码不同.

VALIDATION macro recall differ from the 2 different code.

即(查看 val_unweightedRecall val_recall )

Epoch 10/100
19/18 [===============================] - 13s 703ms/step - loss: 1.5167 - unweightedRecall: 0.1269 - acc: 0.5295 - val_loss: 1.5339 - val_unweightedRecall: 0.1272 - val_acc: 0.5519
 — val_f1: 0.168833 — val_precision: 0.197502 — val_recall 0.15636

为什么用两个不同的代码在宏验证调用中具有不同的价值?

Why do i have different value on my macro validation recall with the two different code?

奖励问题:对于已经尝试过这种方法的人们,真的值得根据我们感兴趣的度量标准(例如,回忆)使用自定义损失,或者使用权重产生绝对结果的分类交叉熵?

Bonus question: For people who have already tryied this, is it really worth to use custom loss based on our interested metric (recall for example) or categorical cross entropy with weights produce same result?

推荐答案

让我回答两个问题,但顺序相反:

let me answer both question but in the opposite order:

您不能将Recall用作自定义损失的基础:它不是凸面的!如果您不完全理解为什么不能将Recall或precision或f1不能用作损失,请花点时间看一下损失的作用(毕竟这是模型中的一个巨大参数).

You can't use Recall as a base for a custom loss: It is not convex! If you do not fully understand why Recall or precision or f1 can't be used as a loss, please take the time to see the role of the loss (it is afterall a huge parameter in your model).

实际上,该回合旨在解决二进制问题.正如他们所说,如果不是你,那就是另一个.但是在您的情况下,这是错误的.让我们去扔代码:

Indeed, the round is intended for a binary problem. As they say, if it's not you then it's the other. But in your case it's wrong. Let's go throw the code:

val_predict = (np.asarray(self.model.predict(self.validation_data[0]))).round()

由内而外,他获取数据(self.validation_data [0;])并预测一个数字(输出1个神经元).这样,他计算出概率为1.如果该概率超过0.5,则该回合将其转换为1,如果低于,则将其转换为0.如您所见,这对您来说是错误的.在某些情况下,您不会预测任何课程.继此错误之后,其余的也是错误的.

from the inside out, he take the data (self.validation_data[0;]) and predict a number (1 neuron as output). As such he compute the probability of being a 1. If this probability is over 0.5, then the round transforms it into a 1, if it is under, it transforms it to a 0. As you can see, it is wrong for you. In some case you won't predict any class. Following this mistake, the rest is also wrong.

现在,解决方案.您想在每个步骤中计算平均召回率.顺便说一句,但它仅适用于验证集".是的,这是您想要的,您可以使用验证来验证模型,而不是火车,否则就作弊了.

Now, the solution. You want to compute the mean Recall at every step. by the way, "but it only works on validation set". yes that is intended, you use the validation to validate the model, not the train, else it is cheating.

所以召回率等于所有积极因素中的真实积极因素.让我们开始吧!

so Recall is equal to true positive over all positives. Let's do that!

def recall(y_true, y_pred):
     recall = 0
     pred = K.argmax(y_pred)
     true = K.argmax(y_true)
     for i in range(16):
         p = K.cast(K.equal(pred,i),'int32')
         t = K.cast(K.equal(true,i),'int32')
         # Compute the true positive
         common = K.sum(K.dot(K.reshape(t,(1,-1)),K.reshape(p,(-1,1))))
         # divide by all positives in t
         recall += common/ (K.sum(t) + K.epsilon)
     return recall/16

这使您平均回想所有班级. 您可以打印每个类的值.

This gives you the mean recall for all classes. you could print the value for every class.

如有任何疑问,请告诉我!

Tell me if you have any question!

有关二进制调用的实现,请参见

for an implementation of the binary Recall, see this question from which the code is adapted.

这篇关于多类CNN的宏指标(调用/F1 ...)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆