如何为 catboost 创建自定义评估指标? [英] How to create custom eval metric for catboost?

查看:129
本文介绍了如何为 catboost 创建自定义评估指标?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

类似的问题:

Catboost 教程

Catboost tutorials

在这个问题中,我有一个二元分类问题.建模后,我们得到了测试模型预测 y_pred,我们已经有了真正的测试标签 y_true.

In this question, I have a binary classification problem. After modelling we get the test model predictions y_pred and we already have true test labels y_true.

我想获得由以下等式定义的自定义评估指标:

I would like to get the custom evaluation metric defined by following equation:

profit = 400 * truePositive - 200*fasleNegative - 100*falsePositive

此外,由于利润越高越好,我想最大化函数而不是最小化函数.

Also, since higher profit is better I would like to maximize the function instead of minimize it.

如何在 catboost 中获得这个 eval_metric?

How to get this eval_metric in catboost?

def get_profit(y_true, y_pred):
    tn, fp, fn, tp = sklearn.metrics.confusion_matrix(y_true,y_pred).ravel()
    loss = 400*tp - 200*fn - 100*fp
    return loss

scoring = sklearn.metrics.make_scorer(get_profit, greater_is_better=True)

使用 catboost

class ProfitMetric(object):
    def get_final_error(self, error, weight):
        return error / (weight + 1e-38)

    def is_max_optimal(self):
        return True

    def evaluate(self, approxes, target, weight):
        assert len(approxes) == 1
        assert len(target) == len(approxes[0])

        approx = approxes[0]

        error_sum = 0.0
        weight_sum = 0.0

        ** I don't know here**

        return error_sum, weight_sum

问题

如何在 catboost 中完成自定义评估指标?

Question

How to complete the custom eval metric in catboost?

到目前为止我的更新

import numpy as np
import pandas as pd
import seaborn as sns
import sklearn

from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split

def get_profit(y_true, y_pred):
    tn, fp, fn, tp = sklearn.metrics.confusion_matrix(y_true,y_pred).ravel()
    profit = 400*tp - 200*fn - 100*fp
    return profit


class ProfitMetric:
    def is_max_optimal(self):
        return True # greater is better

    def evaluate(self, approxes, target, weight):
        assert len(approxes) == 1
        assert len(target) == len(approxes[0])

        approx = approxes[0]

        y_pred = np.rint(approx)
        y_true = np.array(target).astype(int)

        output_weight = 1 # weight is not used

        score = get_profit(y_true, y_pred)
 
        return score, output_weight

    def get_final_error(self, error, weight):
        return error


df = sns.load_dataset('titanic')
X = df[['survived','pclass','age','sibsp','fare']]
y = X.pop('survived')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100)


model = CatBoostClassifier(metric_period=50,
  n_estimators=200,
  eval_metric=ProfitMetric()
)

model.fit(X, y, eval_set=(X_test, y_test)) # this fails

推荐答案

与你的主要区别在于:

@staticmethod
def get_profit(y_true, y_pred):
    y_pred = expit(y_pred).astype(int)
    y_true = y_true.astype(int)
    #print("ACCURACY:",(y_pred==y_true).mean())
    tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
    loss = 400*tp - 200*fn - 100*fp
    return loss

example 您链接了预测,但在检查之后发现 catboost 在内部将预测视为 raw 对数赔率(帽子提示 @Ben).因此,要正确使用 confusion_matrix,您需要确保 y_truey_pred 都是整数类标签.这是通过以下方式完成的:

It's not obvious from the example you linked what are the predictions, but after inspecting it turns out catboost treats predictions internally as raw log-odds (hat tip @Ben). So, to properly use confusion_matrix you need to make it sure both y_true and y_pred are integer class labels. This is done via:

y_pred = scipy.special.expit(y_pred) 
y_true = y_true.astype(int)

所以完整的工作代码是:

So the full working code is:

import seaborn as sns
from catboost import CatBoostClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from scipy.special import expit

df = sns.load_dataset('titanic')
X = df[['survived','pclass','age','sibsp','fare']]
y = X.pop('survived')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=100)

class ProfitMetric:
    
    @staticmethod
    def get_profit(y_true, y_pred):
        y_pred = expit(y_pred).astype(int)
        y_true = y_true.astype(int)
        #print("ACCURACY:",(y_pred==y_true).mean())
        tn, fp, fn, tp = confusion_matrix(y_true, y_pred).ravel()
        loss = 400*tp - 200*fn - 100*fp
        return loss
    
    def is_max_optimal(self):
        return True # greater is better

    def evaluate(self, approxes, target, weight):            
        assert len(approxes) == 1
        assert len(target) == len(approxes[0])
        y_true = np.array(target).astype(int)
        approx = approxes[0]
        score = self.get_profit(y_true, approx)
        return score, 1

    def get_final_error(self, error, weight):
        return error

model = CatBoostClassifier(metric_period=50,
  n_estimators=200,
  eval_metric=ProfitMetric()
)

model.fit(X, y, eval_set=(X_test, y_test))

这篇关于如何为 catboost 创建自定义评估指标?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆