召回后的分类精度和精度 [英] Classification accuracy after recall and precision

查看:106
本文介绍了召回后的分类精度和精度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只是想知道这是否是计算分类准确性的合法方法:

I'm just wondering if this is a legitimate way of calculating classification accuracy:

  1. 获得精确召回阈值
  2. 对于每个阈值,将连续的y_scores二值化
  3. 从列联表(混淆矩阵)计算其准确性
  4. 返回阈值的平均准确度

  1. obtain precision recall thresholds
  2. for each threshold binarize the continuous y_scores
  3. calculate their accuracy from the contingency table (confusion matrix)
  4. return the average accuracy for the thresholds

recall, precision, thresholds = precision_recall_curve(np.array(np_y_true), np.array(np_y_scores))
accuracy = 0
for threshold in thresholds:
    contingency_table = confusion_matrix(np_y_true, binarize(np_y_scores, threshold=threshold)[0])
    accuracy += (float(contingency_table[0][0]) + float(contingency_table[1][1]))/float(np.sum(contingency_table))

print "Classification accuracy is: {}".format(accuracy/len(thresholds))

推荐答案

您正在朝正确的方向前进. 混淆矩阵无疑是计算分类器准确性的正确起点.在我看来,您的目标是接收者的操作特性.

You are heading into the right direction. The confusion matrix definetly is the right start for computing the accuracy of your classifier. It seems to me that you are aiming at reciever operating characteristics.

在统计中,接收器工作特性(ROC)或ROC曲线是一个图形图,它说明了二进制分类器系统的判别阈值变化时的性能. https://en.wikipedia.org/wiki/Receiver_operating_characteristic

In statistics, a receiver operating characteristic (ROC), or ROC curve, is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. https://en.wikipedia.org/wiki/Receiver_operating_characteristic

AUC(曲线下的面积)是衡量分类器性能的指标.可以在这里找到更多信息和解释:

The AUC (area under the curve) is a measurement of your classifiers performance. More information and explanation can be found here:

https://stats.stackexchange.com /questions/132777/什么是auc-stand-for-and-What-is-it

http://mlwiki.org/index.php/ROC_Analysis

这是我的实现,欢迎您改进/评论:

This is my implementation, which you are welcome to improve/comment:

def auc(y_true, y_val, plot=False):  
#check input
if len(y_true) != len(y_val):
    raise ValueError('Label vector (y_true) and corresponding value vector (y_val) must have the same length.\n')
#empty arrays, true positive and false positive numbers
tp = []
fp = []
#count 1's and -1's in y_true
cond_positive = list(y_true).count(1)
cond_negative = list(y_true).count(-1)
#all possibly relevant bias parameters stored in a list
bias_set = sorted(list(set(y_val)), key=float, reverse=True)
bias_set.append(min(bias_set)*0.9)

#initialize y_pred array full of negative predictions (-1)
y_pred = np.ones(len(y_true))*(-1)

#the computation time is mainly influenced by this for loop
#for a contamination rate of 1% it already takes ~8s to terminate
for bias in bias_set:
    #"lower values tend to correspond to label −1"
    #indices of values which exceed the bias
    posIdx = np.where(y_val > bias)
    #set predicted values to 1
    y_pred[posIdx] = 1
    #the following function simply calculates results which enable a distinction 
    #between the cases of true positive and  false positive
    results = np.asarray(y_true) + 2*np.asarray(y_pred)
    #append the amount of tp's and fp's
    tp.append(float(list(results).count(3)))
    fp.append(float(list(results).count(1)))

#calculate false positive/negative rate
tpr = np.asarray(tp)/cond_positive
fpr = np.asarray(fp)/cond_negative
#optional scatterplot
if plot == True:
    plt.scatter(fpr,tpr)
    plt.show()
#calculate AUC
AUC = np.trapz(tpr,fpr)

return AUC

这篇关于召回后的分类精度和精度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆