AUC-ROC用于无排名的分类器,例如OSVM [英] AUC-ROC for a none ranking Classifier such as OSVM

查看:108
本文介绍了AUC-ROC用于无排名的分类器,例如OSVM的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用auc-roc曲线,可以说我有一个无等级分类器,例如一类SVM,其中预测为0和1,并且如果我将预测不轻松转换为概率或分数,不想绘制AUC-ROC,我只想计算AUC以使用它来查看我的模型做得如何,我还能这样做吗?它是否仍将被称为AUC还是作为AUC尤其是存在两个可以使用的阈值(0,1)?如果可以的话,它与按排名分数计算AUC一样好

Im currently working with auc-roc curves , and lets say that I have a none ranking Classifier such as a one class SVM where the predictions are either 0 and 1 and the predictions are not converted to probabilities or scores easily , if I do not want to plot the AUC-ROC instead I would only like to calculate the AUC to use it to see how well my model is doing , can I still do that ? would it still be called or as an AUC especially that there are two thresholds that can be used (0 , 1 ) ? if it is would it be as good as calculating the AUC with ranking scores

现在让我们说我决定使用SVM(0,1)创建的标签来绘制AUC-ROC,它看起来像下面的图我

now lets say that I have decided to plot the AUC-ROC using the labels created by the SVM (0,1) , it would look like the the bellow picture I

是否仍将其视为和AUC曲线?

would it still be considered as and AUC-curve?

非常感谢您的所有帮助和支持

thanks you very much for all your help and support

注意:我已经阅读了以下问题,但没有找到答案: https://www.researchgate.net/post/How_can_I_plot_determine_ROC_AUC_for_SVM https://stats.stackexchange.com/questions/37795/roc-curve-for-discrete-classifiers-like-svm-why-do-we-still-call-it-a-curve

Note : I have read the below questions and I did not find an answer : https://www.researchgate.net/post/How_can_I_plot_determine_ROC_AUC_for_SVM https://stats.stackexchange.com/questions/37795/roc-curve-for-discrete-classifiers-like-svm-why-do-we-still-call-it-a-curve

推荐答案

标准ROC曲线要求更改分类器的概率或分数阈值,并获得相应的有序对(真阳性率,假阳性率)图. )的每个变化阈值.

The standard ROC curve requires varying the probability or score threshold of your classifier, and obtaining the corresponding graph of the ordered pairs of (true positive rate, false positive rate) for each varied threshold value.

由于一类SVM的定义方式是不会在其输出中产生概率结果或分数(这与标准SVM分类器特别不同),因此,除非您有任何要求,否则ROC曲线不适用.创建您自己的分数版本,如下所述.

Since the One-Class SVM is defined in such a way that it does not produce probability results or scores as part of its output (this is specifically different than standard SVM classifiers), it means that a ROC curve is inapplicable unless you create your own version of a score as discussed below.

此外,一类SVM的训练特别不平衡,因为训练数据只是一组正"示例,例如:来自相关分布的观察结果. ROC曲线无论如何都会遭受较大的类别失衡的严重影响,因此ROC曲线可能会产生误导,因为少数离群值的分类得分比一堆非离群值的得分更为重要观察到的分布的最高密度区域.因此,即使您自己创建分数,也建议避免对此类型的模型使用ROC.

Furthermore, the training for a One-Class SVM is specifically hugely imbalanced, because the training data is solely a set of "positive" examples, e.g. observations that come from the distribution in question. ROC curves would suffer greatly from large class imbalance anyway, so the ROC curve could be misleading in the sense that the classification score for a small number of outliers would be hugely more important than the score for a bunch of non-outliers at the heart of the observed distribution's highest density areas. So avoiding ROC for this type of model, even if you create your own scores, is advisable.

您正确地选择精确度"和召回率"作为更好的度量标准是正确的,但是在所显示的图形中,您仍然在沿轴的真阳性率和假阳性率之上叠加了一个图,而AUC -pr(精确召回AUC分数)看起来就像是一个点,误报率填充为0(例如,它纯粹是绘图中的错误).

You are correct to choose precision vs. recall as a better metric, but in the plot you show in your question, you are still overlaying a plot on top of true positive rate and false positive rate along the axes, while the AUC-pr (precision recall AUC score) looks like it is just a single point padded with 0 for the false positive rate (e.g. it is purely a bug in your code for plotting).

为了获得实际的精度召回曲线,您需要某种方式将分数与离群值决策相关联. 一个建议是在训练后使用适合的OneClassSVM对象的decision_function属性.

In order to get an actual precision recall curve, you need some way of associating a score to the outlier decision. One suggestion is to use the decision_function attribute of the fitted OneClassSVM object after training.

如果在所有输入值x上计算decision_function(x)的最大值,则调用此MAX,则关联得分的一种方法是将某些数据y上的预测得分视为.

If you compute the maximum of decision_function(x) over all input values x, call this MAX, then one way of associating a score is to treat the score for the prediction on some data y as score = MAX - decision_function(y).

这假定您以如下方式设置标签:decision_function(x)的大值表示x不是 异常值,因此它确实具有用于训练.如果您使用反向标签设置问题,则可以进行倒数转换或使用其他变换(即,是将OneClassSVM设置为预测离群值是"1",还是将离群值预测为"1",即使训练数据仅包含以下内容)一堂课.

This assumes you have the labels set up in such a way that large value of decision_function(x) means x is not an outlier, so it does have the label of the positive class used for training. You could take the reciprocal or use other transformations if you set up your problem with reverse labels (meaning, whether you set the OneClassSVM to predict '1' for an outlier or '1' for an inlier, even though the training data consists only of one class).

然后,在average_precision_score文档中

Then, in the documentation of average_precision_score you can see that the input y_score can be a non-thresholded measure, like from decision_function. You could also tinker with this, perhaps taking log of that score, etc., if you have any domain knowledge about it that gives you a reason to think to try it.

一旦有了这些手动创建的分数,就可以将它们传递给需要改变阈值的任何精度/召回功能.这不是完美的方法,但至少可以让您了解决策边界用于分类的程度.

Once you have these manually created scores, you can pass them in for any of the precision / recall functions that need to vary the threshold. It's not perfect, but at least gives you a sense of how well the decision boundary is used for the classification.

这篇关于AUC-ROC用于无排名的分类器,例如OSVM的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆