在 Scikit Learn 中运行 SelectKBest 后获取特征名称的最简单方法 [英] The easiest way for getting feature names after running SelectKBest in Scikit Learn
问题描述
我想进行监督学习.
直到现在我知道对所有特征进行监督学习.
Until now I know to do supervised learning to all features.
不过,我也想对 K 个最好的特征进行实验.
However, I would like also to conduct experiment with the K best features.
我阅读了文档,发现在 Scikit 中学习了 SelectKBest 方法.
I read the documentation and found the in Scikit learn there is SelectKBest method.
不幸的是,我不知道在找到那些最好的特征后如何创建新的数据框:
Unfortunately, I am not sure how to create new dataframe after finding those best features:
假设我想用 5 个最佳特征进行实验:
Let's assume I would like to conduct experiment with 5 best features:
from sklearn.feature_selection import SelectKBest, f_classif
select_k_best_classifier = SelectKBest(score_func=f_classif, k=5).fit_transform(features_dataframe, targeted_class)
现在如果我要添加下一行:
Now if I would add the next line:
dataframe = pd.DataFrame(select_k_best_classifier)
我将收到一个没有特征名称的新数据帧(只有从 0 到 4 的索引).
I will receive a new dataframe without feature names (only index starting from 0 to 4).
我应该将其替换为:
dataframe = pd.DataFrame(fit_transofrmed_features, columns=features_names)
我的问题是如何创建 features_names 列表??
My question is how to create the features_names list??
我知道我应该使用:
select_k_best_classifier.get_support()
返回布尔值数组.
数组中的真值代表右列的索引.
The true value in the array represent the index in the right column.
我应该如何将这个布尔数组与我可以通过该方法获得的所有功能名称的数组一起使用:
How should I use this boolean array with the array of all features names I can get via the method:
feature_names = list(features_dataframe.columns.values)
推荐答案
您可以执行以下操作:
mask = select_k_best_classifier.get_support() #list of booleans
new_features = [] # The list of your K best features
for bool, feature in zip(mask, feature_names):
if bool:
new_features.append(feature)
然后更改您的功能名称:
Then change the name of your features:
dataframe = pd.DataFrame(fit_transofrmed_features, columns=new_features)
这篇关于在 Scikit Learn 中运行 SelectKBest 后获取特征名称的最简单方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!