sklearn:选择基于L1的特征后获取特征名称 [英] sklearn: get feature names after L1-based feature selection
问题描述
此问答演示了使用一个执行功能选择时scikit-learn的专用功能选择例程,然后可以按以下方式检索所选功能的名称:
np.asarray(vectorizer.get_feature_names())[featureSelector.get_support()]
例如,在上面的代码中,featureSelector
可能是sklearn.feature_selection.SelectKBest
或sklearn.feature_selection.SelectPercentile
的实例,因为这些类实现了get_support
方法,该方法返回布尔掩码或所选要素的整数索引. /p>
当执行通过线性模型进行特征选择会受到惩罚根据L1规范,目前尚不清楚如何实现这一目标. sklearn.svm.LinearSVC
没有get_support
方法,并且在使用其transform
方法从样本集合中消除特征之后,文档也不清楚如何检索特征索引.我在这里想念东西吗?
对于稀疏估计量,通常可以通过检查系数向量中非零项的位置来找到支持(前提是存在系数向量)例如线性模型)
support = np.flatnonzero(estimator.coef_)
对于您的LinearSVC
罚款为1的
from sklearn.svm import LinearSVC
svc = LinearSVC(C=1., penalty='l1', dual=False)
svc.fit(X, y)
selected_feature_names = np.asarray(vectorizer.get_feature_names())[np.flatnonzero(svc.coef_)]
This question and answer demonstrate that when feature selection is performed using one of scikit-learn's dedicated feature selection routines, then the names of the selected features can be retrieved as follows:
np.asarray(vectorizer.get_feature_names())[featureSelector.get_support()]
For example, in the above code, featureSelector
might be an instance of sklearn.feature_selection.SelectKBest
or sklearn.feature_selection.SelectPercentile
, since these classes implement the get_support
method which returns a boolean mask or integer indices of the selected features.
When one performs feature selection via linear models penalized with the L1 norm, it's unclear how to accomplish this. sklearn.svm.LinearSVC
has no get_support
method and the documentation doesn't make clear how to retrieve the feature indices after using its transform
method to eliminate features from a collection of samples. Am I missing something here?
For sparse estimators you can generally find the support by checking where the non-zero entries are in the coefficients vector (provided the coefficients vector exists, which is the case for e.g. linear models)
support = np.flatnonzero(estimator.coef_)
For your LinearSVC
with l1 penalty it would accordingly be
from sklearn.svm import LinearSVC
svc = LinearSVC(C=1., penalty='l1', dual=False)
svc.fit(X, y)
selected_feature_names = np.asarray(vectorizer.get_feature_names())[np.flatnonzero(svc.coef_)]
这篇关于sklearn:选择基于L1的特征后获取特征名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!