如何在Scikit-Learn(sklearn)中将GridSearchCV中的log_loss与多类标签一起使用? [英] How to use `log_loss` in `GridSearchCV` with multi-class labels in Scikit-Learn (sklearn)?
问题描述
我正在尝试使用GridSearchCV
的scoring
参数中的log_loss
参数来调整此多类(6个类)分类器.我不知道如何给它一个label
参数.即使我给它sklearn.metrics.log_loss
,交叉验证中的每次迭代也会改变它,所以我不知道如何给它赋予labels
参数?
I'm trying to use the log_loss
argument in the scoring
parameter of GridSearchCV
to tune this multi-class (6 classes) classifier. I don't understand how to give it a label
parameter. Even if I gave it sklearn.metrics.log_loss
, it would change for each iteration in the cross-validation so I don't understand how to give it the labels
parameter?
我正在使用Python v3.6
和Scikit-Learn v0.18.1
如何将GridSearchCV
与log_loss
一起使用以进行多类模型调整?
How can I use GridSearchCV
with log_loss
with multi-class model tuning?
我的班级代表:
1 31
2 18
3 28
4 19
5 17
6 22
Name: encoding, dtype: int64
我的代码:
param_test = {"criterion": ["friedman_mse", "mse", "mae"]}
gsearch_gbc = GridSearchCV(estimator = GradientBoostingClassifier(n_estimators=10),
param_grid = param_test, scoring="log_loss", n_jobs=1, iid=False, cv=cv_indices)
gsearch_gbc.fit(df_attr, Se_targets)
这是错误的结尾,完整的错误在这里 https://pastebin.com/1CshpEBN:
Here's the tail end of the error and the full one is here https://pastebin.com/1CshpEBN:
ValueError: y_true contains only one label (1). Please provide the true labels explicitly through the labels argument.
更新: 只需使用此功能即可根据基于@Grr
UPDATE: Just use this to make the scorer based on based on @Grr
log_loss_build = lambda y: metrics.make_scorer(metrics.log_loss, greater_is_better=False, needs_proba=True, labels=sorted(np.unique(y)))
推荐答案
我的假设是,您的数据拆分某种程度上在y_true中只有一个类标签.虽然根据您发布的发布情况似乎不太可能,但我想这是可能的.虽然我还没有遇到过这个问题,但似乎在[sklearn.metrics.log_loss
](
my assumption is that somehow your data split has only one class label in y_true. while this seems unlikely based on the distribution you posted, i guess it is possible. While i havent run into this before it seems that in [sklearn.metrics.log_loss
](http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html) the label argument is expected if the labels are all the same. The wording of this section of the documentation also makes it seem as if the method imputes a binary classification if labels
is not passed.
现在,您正确地假设您应该将log_loss
传递为scorer=sklearn.metrics.log_loss(labels=your_labels)
Now as you correctly assume you should pass log_loss
as scorer=sklearn.metrics.log_loss(labels=your_labels)
这篇关于如何在Scikit-Learn(sklearn)中将GridSearchCV中的log_loss与多类标签一起使用?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!