将降采样后的预测概率转换为分类中的实际概率(使用mlr) [英] Convert predicted probabilities after downsampling to actual probabilities in classification (using mlr)

查看:345
本文介绍了将降采样后的预测概率转换为分类中的实际概率(使用mlr)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果在不平衡二进制目标变量的情况下使用欠采样来训练模型,则预测方法会在假设平衡数据集的情况下计算概率.对于不平衡的数据,如何将这些概率转换为实际概率?转换参数/函数是在mlr软件包中还是在另一个软件包中实现的?例如:

If I use undersampling in case of an unbalanced binary target variable to train a model, the prediction method calculates probabilities under the assumption of a balanced data set. How can I convert these probabilities to actual probabilities for the unbalanced data? Is the a conversion argument/function implemented in the mlr package or another package? For example:

a <- data.frame(y=factor(sample(0:1, prob = c(0.1,0.9), replace=T, size=100)))
a$x <- as.numeric(a$y)+rnorm(n=100, sd=1)
task <- makeClassifTask(data=a, target="y", positive="0")
learner <- makeLearner("classif.binomial", predict.type="prob")
learner <- makeUndersampleWrapper(learner, usw.rate = 0.1, usw.cl = "1")
model <- train(learner, task, subset = 1:50)
pred <- predict(model, task, subset = 51:100)
head(pred$data)

推荐答案

论文标题:通过欠采样校准概率 不平衡分类的问题" Andrea Dal Pozzolo ,奥利维尔·卡伦(Olivier Caelen)† ,里德·约翰逊(Reid A. Johnson) ,吉安卢卡(Nianluca Bontempi)

Paper Title: "Calibrating Probability with Undersampling for Unbalanced Classification" Andrea Dal Pozzolo , Olivier Caelen† , Reid A. Johnson , Gianluca Bontempi

它专门设计用于在下采样情况下解决校准问题(即,将分类器的预测概率转换为不平衡情况下的非概率).

It is specifically designed to tackle the issue of calibration (i.e. transforming predicted probabilities of your classifier into atcual probabilities in the unbalanced case) in the case of downsampling.

您只需要使用以下公式来校正预测概率p_s:

You just have to correct your predicted probability p_s using the following formula:

   p = beta * p_s / ((beta-1) * p_s + 1)

其中beta是在原始训练集中被采样后的多数类实例数量与多数类实例数量之比.

where beta is the ratio of the number majority class instances after undersampling over the number majority class ones in the original training set.

其他方法 已经提出了其他不专门针对下采样偏差的方法.其中最受欢迎的是以下几种:

Other methods Other methods which are not specifically focused on the downsampling bias have been proposed. Among which the most popular ones are the following:

它们都在R中实现

这篇关于将降采样后的预测概率转换为分类中的实际概率(使用mlr)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆