多标签分类的功能选择(scikit-learn) [英] Feature selection for multilabel classification (scikit-learn)

查看:118
本文介绍了多标签分类的功能选择(scikit-learn)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过scikit-learn(sklearn.feature_selection.SelectKBest)中的卡方方法进行特征选择.当我尝试将其应用于多标签问题时,会收到以下警告:

I'm trying to do a feature selection by chi-square method in scikit-learn (sklearn.feature_selection.SelectKBest). When I'm trying to apply this to a multilabel problem, I get this warning:

UserWarning: Duplicate scores. Result may depend on feature ordering.There are probably duplicate features, or you used a classification score for a regression task. warn("Duplicate scores. Result may depend on feature ordering."

UserWarning: Duplicate scores. Result may depend on feature ordering.There are probably duplicate features, or you used a classification score for a regression task. warn("Duplicate scores. Result may depend on feature ordering."

为什么会出现这种情况?如何正确应用功能选择?

Why is it appearning and how to properly apply feature selection is this case?

推荐答案

代码警告您,由于某些功能的得分完全相同,因此可能需要执行任意平局决胜赛.

The code warns you that arbitrary tie-breaking may need to be performed because some features have exactly the same score.

也就是说,功能选择对于开箱即用的多标签实际上并不起作用;当前您能做的最好的事情就是将特征选择和分类器捆绑在一起,然后将其输入到多标签元估计器中.示例(未经测试):

That said, feature selection does not actually work for multilabel out of the box; the best you can currently do is tie feature selection and a classifier together in a pipeline, then feed that to a multilabel meta-estimator. Example (untested):

clf = Pipeline([('chi2', SelectKBest(chi2, k=1000)),
                ('svm', LinearSVC())])
multi_clf = OneVsRestClassifier(clf)

(我认为,即使在绑定特征实际上不是第k个和第(k + 1)个特征时,也会发出此警告.通常可以安全地将其忽略.)

(This warning is, I believe, issued even when the tied features aren't actually the k'th and (k+1)'th, I think. It can usually be ignored safely.)

这篇关于多标签分类的功能选择(scikit-learn)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆