调整自定义损失函数以进行梯度提升分类 [英] Adjust custom loss function for gradient boosting classification
问题描述
我已经实现了梯度增强决策树来进行多类别分类.我的自定义损失函数如下所示:
I have implemented a gradient boosting decision tree to do a mulitclass classification. My custom loss functions look like this:
import numpy as np
from sklearn.preprocessing import OneHotEncoder
def softmax(mat):
res = np.exp(mat)
res = np.multiply(res, 1/np.sum(res, axis=1, keepdims=True))
return res
def custom_asymmetric_objective(y_true, y_pred_encoded):
pred = y_pred_encoded.reshape((-1, 3), order='F')
pred = softmax(pred)
y_true = OneHotEncoder(sparse=False,categories='auto').fit_transform(y_true.reshape(-1, 1))
grad = (pred - y_true).astype("float")
hess = 2.0 * pred * (1.0-pred)
return grad.flatten('F'), hess.flatten('F')
def custom_asymmetric_valid(y_true, y_pred_encoded):
y_true = OneHotEncoder(sparse=False,categories='auto').fit_transform(y_true.reshape(-1, 1)).flatten('F')
margin = (y_true - y_pred_encoded).astype("float")
loss = margin*10
return "custom_asymmetric_eval", np.mean(loss), False
一切正常,但是现在我想通过以下方式调整损失函数:如果某项的分类不正确,它应该惩罚",并且应该为一定的约束加罚分(这是之前计算得出的,比方说惩罚是0,05,所以只是一个实数). 有什么办法可以同时考虑错误分类和惩罚值吗?
Everything works, but now I want to adjust my loss function in the following way: It should "penalize" if an item is classified incorrectly, and a penalty should be added for a certain constraint (this is calculated before, let's just say the penalty is e.g. 0,05, so just a real number). Is there any way to consider both, the misclassification and the penalty value?
推荐答案
尝试进行L2正则化:权重将在减去learning rate
乘以error
乘以x
加上惩罚项lambda
Try L2 regularization: weights will be updated following the subtraction of a learning rate
times error
times x
plus the penalty term lambda
weight
to the power of 2
简化:
这将是效果:
已添加:惩罚项(在等式右边)增加了模型的泛化能力.因此,如果您在训练集中过度拟合模型,则测试集中的性能将很差.因此,您要对训练集中的这些正确"分类进行惩罚,这些分类会在测试集中产生错误并影响泛化.
ADDED: The penalization term (on the right of equation) increases the generalization power of your model. So, if you overfit your model in training set, the perfomance will be poor in test set. So, you penalize these "right" classifications in training set that generate error in test set and compromise generalization.
这篇关于调整自定义损失函数以进行梯度提升分类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!