Sklearn VarianceThreshold 不去除低方差特征 [英] Sklearn VarianceThreshold not removing low variance features

查看:67
本文介绍了Sklearn VarianceThreshold 不去除低方差特征的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我在这里的第一篇文章.如果您有关于更有效提问的建议,我很想听听.

This is my first post here. If you have suggestions on asking questions more efficiently I'd like to hear it.

我正在使用 Mercedez benz 数据集,它可以在 kaggle 此处.该数据集有 369 个数值特征.删除目标方差和分类特征后,我希望删除低方差特征.我正在使用 Sklearn 的方差阈值.

I am working with The Mercedez benz dataset, it can be found on kaggle here. This dataset has 369 numerical features. After removing the target variance and categorical features I am looking to remove the low variance features. I am using Sklearn's Variance Threshold.

我将包含代码,但这些步骤似乎很简单.我玩过阈值参数,但每次我拉出转换数据集的形状时,它都有相同的 369 个特征.

I will include the code however these steps seem to be straight forward. I have played around with the threshold parameter but every time I pull the shape of the transformed dataset it has the same 369 features.

如果有人看到我哪里出错了,我感谢帮助!

If anyone sees where I am going wrong I appreciate the help!

    X = df.iloc[:, df.columns != 'y']
    Y = df.iloc[:, df.columns == 'y']
    print(X.shape)
    print(Y.shape)

(4209, 377)
(4209, 1)

X_cat = X.select_dtypes(include = 'object')
X_num = X.select_dtypes(include = 'int64')
print(X_cat.shape)
print(X_num.shape)

(4209, 8)
(4209, 369)

X_num.var().sort_values()

X268    0.000000e+00
X297    0.000000e+00
X290    0.000000e+00
X289    0.000000e+00
X330    0.000000e+00
            ...     
X191    2.492121e-01
X362    2.496467e-01
X337    2.497867e-01
X127    2.500357e-01
ID      5.941936e+06
Length: 369, dtype: float64

from sklearn.feature_selection import VarianceThreshold
VT = VarianceThreshold()
VT.fit_transform(X_num)
print(X_num.shape)

(4209, 369)

推荐答案

您没有转换原始数据:

from sklearn.feature_selection import VarianceThreshold
# defining the function VT
VT = VarianceThreshold()
#Fit the function VT and transform, but not saving it
VT.fit_transform(X_num)

所以你必须把它改成:

from sklearn.feature_selection import VarianceThreshold
# defining the function VT
VT = VarianceThreshold()
#Fit the function VT and transform, and saving it in X_num
X_num = VT.fit_transform(X_num)

这篇关于Sklearn VarianceThreshold 不去除低方差特征的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆