选择Scikit功能后保留功能名称 [英] Retain feature names after Scikit Feature Selection
问题描述
从Scikit-Learn对一组数据运行方差阈值后,它将删除几个功能.我觉得我在做一些简单而又愚蠢的事情,但是我想保留其余功能的名称.以下代码:
After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I'm doing something simple yet stupid, but I'd like to retain the names of the remaining features. The following code:
def VarianceThreshold_selector(data):
selector = VarianceThreshold(.5)
selector.fit(data)
selector = (pd.DataFrame(selector.transform(data)))
return selector
x = VarianceThreshold_selector(data)
print(x)
更改以下数据(这只是行的一小部分):
changes the following data (this is just a small subset of the rows):
Survived Pclass Sex Age SibSp Parch Nonsense
0 3 1 22 1 0 0
1 1 2 38 1 0 0
1 3 2 26 0 0 0
进入(再次只是一小部分行)
into this (again just a small subset of the rows)
0 1 2 3
0 3 22.0 1 0
1 1 38.0 1 0
2 3 26.0 0 0
使用get_support方法,我知道它们是Pclass,Age,Sibsp和Parch,所以我宁愿这样返回的内容更像是:
Using the get_support method, I know that these are Pclass, Age, Sibsp, and Parch, so I'd rather this return something more like :
Pclass Age Sibsp Parch
0 3 22.0 1 0
1 1 38.0 1 0
2 3 26.0 0 0
有没有简单的方法可以做到这一点?我对Scikit Learn非常陌生,所以我可能只是在做一些愚蠢的事情.
Is there an easy way to do this? I'm very new with Scikit Learn, so I'm probably just doing something silly.
推荐答案
这样的帮助吗?如果将其传递给pandas数据框,它将获取列并使用get_support
,如您提到的那样按列索引对其列进行迭代,以仅拉出满足方差阈值的列标题.
Would something like this help? If you pass it a pandas dataframe, it will get the columns and use get_support
like you mentioned to iterate over the columns list by their indices to pull out only the column headers that met the variance threshold.
>>> df
Survived Pclass Sex Age SibSp Parch Nonsense
0 0 3 1 22 1 0 0
1 1 1 2 38 1 0 0
2 1 3 2 26 0 0 0
>>> from sklearn.feature_selection import VarianceThreshold
>>> def variance_threshold_selector(data, threshold=0.5):
selector = VarianceThreshold(threshold)
selector.fit(data)
return data[data.columns[selector.get_support(indices=True)]]
>>> variance_threshold_selector(df, 0.5)
Pclass Age
0 3 22
1 1 38
2 3 26
>>> variance_threshold_selector(df, 0.9)
Age
0 22
1 38
2 26
>>> variance_threshold_selector(df, 0.1)
Survived Pclass Sex Age SibSp
0 0 3 1 22 1
1 1 1 2 38 1
2 1 3 2 26 0
这篇关于选择Scikit功能后保留功能名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!