如何从gridSearchCV的输出中获取要素名称 [英] How to get feature names from output of gridSearchCV

查看:255
本文介绍了如何从gridSearchCV的输出中获取要素名称的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用sklearn在Naive Bayes上实现了PCA,并使用GridSearchCV优化了PCA的组件数量.

I implemented PCA with Naive Bayes using sklearn and I optimized the PCA number of components using GridSearchCV.

我试图找出最佳估计量的特征名称,但我无法.这是我尝试过的代码.

I tried to figure out the feature names of the best estimator but I was not able to. Here's the code that I have tried.

from sklearn.cross_validation import train_test_split 
features_train, features_test, labels_train, labels_test = \
train_test_split(features, labels, test_size=0.3, random_state=42)
### A Naive Bayes classifier combined with PCA is used and its accuracy is tested 

pca = decomposition.PCA()
#clf = GaussianNB()
clf = Pipeline(steps=[('pca', pca), ('gaussian_NB', GaussianNB())])
n_components = [3, 5, 7, 9]
clf = GridSearchCV(clf,
                         dict(pca__n_components=n_components))

# from sklearn.tree import DecisionTreeClassifier
#clf = DecisionTreeClassifier(random_state=0, min_samples_split=20)
clf = clf.fit(features_train, labels_train)
features_pred = clf.predict(features_test) 
print "The number of components of the best estimator is ", clf.best_estimator_.named_steps['pca'].n_components
print "The best parameters:", clf.best_params_
#print "The best estimator", clf.best_estimator_.get_params(deep=True).gaussian_NB
# best_est = RFE(clf.best_estimator_)
# print "The best estimator:", best_est
estimator = clf.best_estimator_
print "The features are:", estimator['features'].get_feature_names()

推荐答案

您似乎对降维功能选择感到困惑. PCA是降维技术,它不选择特征,而是寻找较低维的线性投影.您得到的功能不是原始功能-它们是这些功能的线性组合.因此,如果您的原始特征是PCA变暗2后的宽度",高度"和年龄",那么您最终将获得诸如"0.4 *宽度+ 0.1 *高度-0.05 *年龄"和"0.3 *高度-0.2 *宽度"的特征.

You seem to be confusing dimensionality reduction and features selection. PCA is dimensionality reduction technique, it does not select features, it looks for a lower dimensional linear projection. Your resulting features are not your original ones - they are linear combinations of those. Thus if your original features were "width", "height" and "age" after PCA to dim 2 you end up with features like "0.4 * width + 0.1 * height - 0.05 * age" and "0.3 * height - 0.2 * width".

这篇关于如何从gridSearchCV的输出中获取要素名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆