精度类型 [英] Type of precision

查看:86
本文介绍了精度类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用keras库得到的精度如下:

Precision is obtained below by using keras library:

model.compile(optimizer='sgd',
          loss='mse',
          metrics=[tf.keras.metrics.Precision()])

sklearn 计算出的精度和 keras 计算出的精度是一样的?

What type of precision calculated by sklearn is equal to the precision calculated by keras?

precision_score(y_true, y_pred, average=???)

  1. 加权

当您将 zero_division 设置为 1 时会发生什么,如下所示?:

What happens when you set the zero_division to be 1 as in below?:

precision_score(y_true, y_pred, average=None, zero_division=1)

推荐答案

TLDR; 默认为 binary 用于二进制分类和 micro 用于多-类分类.其他平均类型,例如 Nonemacro 也可以通过如下所述的微小修改来实现.

TLDR; Default is binary for binary classification and micro for multi-class classification. Other average types such as None and macro can also be achieved with minor modification as explained below.

这应该能让您清楚地了解 tf.keras.Precision()sklearn.metrics.precision_score() 之间的区别.让我们比较不同的场景.

This should give you some clarity on the differences between tf.keras.Precision() and sklearn.metrics.precision_score(). Let's compare different scenarios.

场景 1:二元分类

对于二元分类,您的 y_true 和 y_pred 分别为 0,1 和 0-1.两者的实现都非常简单.

For binary classification, your y_true and y_pred are 0,1 and 0-1 respectively. The implementation for both is quite straight forward.

Sklearn 文档:仅报告 pos_label 指定的类的结果.这仅适用于目标 (y_{true,pred}) 是二进制的.

Sklearn documentation: Only report results for the class specified by pos_label. This is applicable only if targets (y_{true,pred}) are binary.

#Binary classification

from sklearn.metrics import precision_score
import tensorflow as tf

y_true = [0,1,1,1]
y_pred = [1,0,1,1]

print('sklearn precision: ',precision_score(y_true, y_pred, average='binary'))
#Only report results for the class specified by pos_label. 
#This is applicable only if targets (y_{true,pred}) are binary.

m = tf.keras.metrics.Precision()
m.update_state(y_true, y_pred)
print('tf.keras precision:',m.result().numpy())

sklearn precision:  0.6666666666666666
tf.keras precision: 0.6666667

场景 2:多类分类(全局精度)

在这里,您正在使用多类标签,但您不必担心每个类的 Precision 是如何的.您只需要一组全局的 TP 和 FP 来计算总精度分数.在 sklearn 中,这是由参数 micro 设置的,而在 tf.keras 中,这是 Precision() 的默认设置代码>

Here you are working with multi-class labels but you are not bothered about how the Precision is for each individual class. You simply want a global set of TP and FP to calculate a total precision score. In sklearn this is set by the parameter micro, while in tf.keras this is the default setting for Precision()

Sklearn 文档:通过计算真阳性、假阴性和假阳性的总数来全局计算指标.

Sklearn documentation: Calculate metrics globally by counting the total true positives, false negatives and false positives.

#Multi-class classification (global precision)

#3 classes, 6 samples
y_true = [[1,0,0],[0,1,0],[0,0,1],[1,0,0],[0,1,0],[0,0,1]]
y_pred = [[1,0,0],[0,0,1],[0,1,0],[1,0,0],[1,0,0],[0,1,0]]

print('sklearn precision: ',precision_score(y_true, y_pred, average='micro'))
#Calculate metrics globally by counting the total true positives, false negatives and false positives.

m.reset_states()
m = tf.keras.metrics.Precision()
m.update_state(y_true, y_pred)
print('tf.keras precision:',m.result().numpy())

sklearn precision:  0.3333333333333333
tf.keras precision: 0.33333334

场景 3:多类分类(每个标签的二进制精度)

如果您想知道每个单独类的精度,您会对这个场景感兴趣.在 sklearn 中,这是通过将 average 参数设置为 None 来完成的,而在 tf.keras 中,您必须使用 class_id 为每个单独的类分别实例化对象.

You are interested in this scenario if you want to know the precision for each individual class. In sklearn this is done by setting the average parameter to None, while in tf.keras you will have to instantiate the object for each individual class separately using class_id.

Sklearn 文档:如果没有,则返回每个班级的分数.

Sklearn documentation: If None, the scores for each class are returned.

#Multi-class classification (binary precision for each label)

#3 classes, 6 samples
y_true = [[1,0,0],[0,1,0],[0,0,1],[1,0,0],[0,1,0],[0,0,1]]
y_pred = [[1,0,0],[0,0,1],[0,1,0],[1,0,0],[1,0,0],[0,1,0]]

print('sklearn precision: ',precision_score(y_true, y_pred, average=None))
#If None, the scores for each class are returned.

#For class 0
m0 = tf.keras.metrics.Precision(class_id=0)
m0.update_state(y_true, y_pred)

#For class 1
m1 = tf.keras.metrics.Precision(class_id=1)
m1.update_state(y_true, y_pred)

#For class 2
m2 = tf.keras.metrics.Precision(class_id=2)
m2.update_state(y_true, y_pred)

mm = [m0.result().numpy(), m1.result().numpy(), m2.result().numpy()]

print('tf.keras precision:',mm)

sklearn precision:  [0.66666667 0.         0.        ]
tf.keras precision: [0.6666667, 0.0, 0.0]

场景 4:多类分类(单个二进制分数的平均值)

一旦您计算了每个类别的单独精度,您可能需要取平均分数(或加权平均值).在 sklearn 中,通过将参数 average 设置为 macro 来获取单个分数的简单平均值.在 tf.keras 中,您可以通过取上述场景中计算的各个精度的平均值来获得相同的结果.

Once you have calculated the individual precision for each class, you may want to take the average score (or weighted average). In sklearn, a simple average of the individual scores is taken by setting the parameter average to macro. In tf.keras you can get the same result by taking an average of the individual precisions as calculated in the scenario above.

Sklearn 文档:计算每个标签的指标,并找到它们的未加权平均值.

Sklearn documentation: Calculate metrics for each label, and find their unweighted mean.

#Multi-class classification (Average of individual binary scores)

#3 classes, 6 samples
y_true = [[1,0,0],[0,1,0],[0,0,1],[1,0,0],[0,1,0],[0,0,1]]
y_pred = [[1,0,0],[0,0,1],[0,1,0],[1,0,0],[1,0,0],[0,1,0]]

print('sklearn precision (Macro): ',precision_score(y_true, y_pred, average='macro'))
print('sklearn precision (Avg of None):' ,np.average(precision_score(y_true, y_pred, average=None)))

print(' ')

print('tf.keras precision:',np.average(mm)) #mm is list of individual precision scores

sklearn precision (Macro):  0.2222222222222222
sklearn precision (Avg of None):  0.2222222222222222
 
tf.keras precision: 0.22222222

注意:请记住,使用 sklearn,您的模型可以直接预测标签,而 precision_score 是一种独立的方法.因此,它可以直接对预测和实际值的标签列表进行操作.然而,tf.keras.Precision() 是一个必须应用于二进制或多类密集输出的度量.它将无法直接使用标签.您必须为每个样本提供一个长度为 n 的数组,其中 n 是类/输出密集节点的数量.

NOTE: Remember, with sklearn, you have models that are predicting labels directly and the precision_score is a standalone method. Therefore, it can operate directly on a list of labels for predicted and actuals. However, tf.keras.Precision() is a metric that has to be applied over a binary or multi-class dense output. It will NOT be able to work with labels directly. You will have to give it an n-length array for each sample, where n is the number of classes/output dense nodes.

希望这能阐明 2 在各种情况下的不同之处.请在 sklearn 文档tf.keras 文档.

Hope this clarifies how the 2 are different in various scenarios. Please find more details in the sklearn documentation and the tf.keras documentation.

你的第二个问题 -

根据 sklearn 文档,

As per the sklearn documentation,

zero_division - "warn", 0 or 1, default="warn"
#Sets the value to return when there is a zero division. If set to "warn", #this acts as 0, but warnings are also raised.

这是一个异常处理标志.在计算分数的过程中,如果遇到被零除,它会认为它等于零并警告.否则,如果明确设置为 1,则将其设置为 1.

This is an exception handling flag. During the calculation of the score, if there comes a time when it encounters a divide by zero, it will consider it to be equal to zero and warn. Else, set it to 1, if set explicitly as 1.

这篇关于精度类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆