sklearn:选择基于L1的特征后获取特征名称 [英] sklearn: get feature names after L1-based feature selection

查看:455
本文介绍了sklearn:选择基于L1的特征后获取特征名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此问答演示了使用一个执行功能选择时scikit-learn的专用功能选择例程,然后可以按以下方式检索所选功能的名称:

np.asarray(vectorizer.get_feature_names())[featureSelector.get_support()]

例如,在上面的代码中,featureSelector可能是sklearn.feature_selection.SelectKBestsklearn.feature_selection.SelectPercentile的实例,因为这些类实现了get_support方法,该方法返回布尔掩码或所选要素的整数索引. /p>

当执行通过线性模型进行特征选择会受到惩罚根据L1规范,目前尚不清楚如何实现这一目标. sklearn.svm.LinearSVC没有get_support方法,并且在使用其transform方法从样本集合中消除特征之后,文档也不清楚如何检索特征索引.我在这里想念东西吗?

解决方案

对于稀疏估计量,通常可以通过检查系数向量中非零项的位置来找到支持(前提是存在系数向量)例如线性模型)

support = np.flatnonzero(estimator.coef_)

对于您的LinearSVC罚款为1的

from sklearn.svm import LinearSVC
svc = LinearSVC(C=1., penalty='l1', dual=False)
svc.fit(X, y)
selected_feature_names = np.asarray(vectorizer.get_feature_names())[np.flatnonzero(svc.coef_)]

This question and answer demonstrate that when feature selection is performed using one of scikit-learn's dedicated feature selection routines, then the names of the selected features can be retrieved as follows:

np.asarray(vectorizer.get_feature_names())[featureSelector.get_support()]

For example, in the above code, featureSelector might be an instance of sklearn.feature_selection.SelectKBest or sklearn.feature_selection.SelectPercentile, since these classes implement the get_support method which returns a boolean mask or integer indices of the selected features.

When one performs feature selection via linear models penalized with the L1 norm, it's unclear how to accomplish this. sklearn.svm.LinearSVC has no get_support method and the documentation doesn't make clear how to retrieve the feature indices after using its transform method to eliminate features from a collection of samples. Am I missing something here?

解决方案

For sparse estimators you can generally find the support by checking where the non-zero entries are in the coefficients vector (provided the coefficients vector exists, which is the case for e.g. linear models)

support = np.flatnonzero(estimator.coef_)

For your LinearSVC with l1 penalty it would accordingly be

from sklearn.svm import LinearSVC
svc = LinearSVC(C=1., penalty='l1', dual=False)
svc.fit(X, y)
selected_feature_names = np.asarray(vectorizer.get_feature_names())[np.flatnonzero(svc.coef_)]

这篇关于sklearn:选择基于L1的特征后获取特征名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆