Scikit-learn 中 OneHotEncoder 和 KNNImpute 之间的循环 [英] Cyclical Loop Between OneHotEncoder and KNNImpute in Scikit-learn

查看：28 发布时间：2021/12/14 10:04:46 python machine-learning scikit-learn preprocessor

本文介绍了Scikit-learn 中 OneHotEncoder 和 KNNImpute 之间的循环的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理一个非常简单的数据集.它在分类和数字特征方面都有一些缺失值.因此，我尝试使用 sklearn.preprocessing.KNNImpute 来获得最准确的插补.但是，当我运行以下代码时:

I'm working with a really simple dataset. It has some missing values, both in categorical and numeric features. Because of this, I'm trying to use sklearn.preprocessing.KNNImpute to get the most accurate imputation I can. However, when I run the following code:

imputer = KNNImputer(n_neighbors=120)

imputer.fit_transform(x_train)

我收到错误:ValueError: could not convert string to float: 'Private'

这是有道理的，它显然无法处理分类数据.但是当我尝试使用以下命令运行 OneHotEncoder 时:

That makes sense, it obviously can't handle categorical data. But when I try to run OneHotEncoder with:

encoder = OneHotEncoder(drop="first")

encoder.fit_transform(x_train[categorical_features])

它抛出错误:ValueError: Input contains NaN

我更喜欢使用 KNNImpute 即使是分类数据，因为我觉得如果我只使用 ColumnTransform 并用数字进行估算，我会失去一些准确性和分类数据分开.有没有办法让 OneHotEncoder 忽略这些缺失值?如果没有，使用 ColumnTransform 或更简单的输入器是否是解决此问题的更好方法?

I'd prefer to use KNNImpute even with the categorical data as I feel like I'd be losing some accuracy if I just use a ColumnTransform and impute with numeric and categorical data seperately. Is there any way to get OneHotEncoder to ignore these missing values? If not, is using ColumnTransform or a simpler imputer a better way of tackling this problem?

提前致谢

Scikit-learn 中 OneHotEncoder 和 KNNImpute 之间的循环 [英] Cyclical Loop Between OneHotEncoder and KNNImpute in Scikit-learn

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

Scikit-learn 中 OneHotEncoder 和 KNNImpute 之间的循环 [英] Cyclical Loop Between OneHotEncoder and KNNImpute in Scikit-learn

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭