管道中的python功能选择:如何确定功能名称? [英] python feature selection in pipeline: how determine feature names?

查看：114 发布时间：2020/11/3 23:59:49 scikit-learn pipeline feature-selection

本文介绍了管道中的python功能选择:如何确定功能名称?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我使用管道和grid_search选择最佳参数，然后使用这些参数来拟合最佳管道("best_pipe").但是，由于feature_selection(SelectKBest)在管道中，因此没有适合SelectKBest的应用.

i used pipeline and grid_search to select the best parameters and then used these parameters to fit the best pipeline ('best_pipe'). However since the feature_selection (SelectKBest) is in the pipeline there has been no fit applied to SelectKBest.

我需要知道'k'个选定特征的特征名称.有什么想法如何找回它们吗?预先谢谢你

I need to know the feature names of the 'k' selected features. Any ideas how to retrieve them? Thank you in advance

from sklearn import (cross_validation, feature_selection, pipeline,
                     preprocessing, linear_model, grid_search)
folds = 5
split = cross_validation.StratifiedKFold(target, n_folds=folds, shuffle = False, random_state = 0)

scores = []
for k, (train, test) in enumerate(split):

    X_train, X_test, y_train, y_test = X.ix[train], X.ix[test], y.ix[train], y.ix[test]

    top_feat = feature_selection.SelectKBest()

    pipe = pipeline.Pipeline([('scaler', preprocessing.StandardScaler()),
                                 ('feat', top_feat),
                                 ('clf', linear_model.LogisticRegression())])

    K = [40, 60, 80, 100]
    C = [1.0, 0.1, 0.01, 0.001, 0.0001, 0.00001]
    penalty = ['l1', 'l2']

    param_grid = [{'feat__k': K,
                  'clf__C': C,
                  'clf__penalty': penalty}]

    scoring = 'precision'

    gs = grid_search.GridSearchCV(estimator=pipe, param_grid = param_grid, scoring = scoring)
    gs.fit(X_train, y_train)

    best_score = gs.best_score_
    scores.append(best_score)

    print "Fold: {} {} {:.4f}".format(k+1, scoring, best_score)
    print gs.best_params_

best_pipe = pipeline.Pipeline([('scale', preprocessing.StandardScaler()),
                          ('feat', feature_selection.SelectKBest(k=80)),
                          ('clf', linear_model.LogisticRegression(C=.0001, penalty='l2'))])

best_pipe.fit(X_train, y_train)
best_pipe.predict(X_test)

推荐答案

您可以在best_pipe中按名称访问功能选择器:

You can access the feature selector by name in best_pipe:

features = best_pipe.named_steps['feat']

然后，您可以在索引数组上调用transform()以获得所选列的名称:

Then you can call transform() on an index array to get the names of the selected columns:

X.columns[features.transform(np.arange(len(X.columns)))]

此处的输出将是在管道中选择的80个列名称.

The output here will be the eighty column names selected in the pipeline.

这篇关于管道中的python功能选择:如何确定功能名称?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

管道中的python功能选择:如何确定功能名称? [英] python feature selection in pipeline: how determine feature names?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

管道中的python功能选择:如何确定功能名称? [英] python feature selection in pipeline: how determine feature names?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭