使用scikit-learn的Imputer模块预测缺失值 [英] Predicting missing values with scikit-learn's Imputer module

查看:317
本文介绍了使用scikit-learn的Imputer模块预测缺失值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个非常基本的程序,使用 scikit-learn的Imputer 类来预测数据集中的缺失值.

I am writing a very basic program to predict missing values in a dataset using scikit-learn's Imputer class.

我制作了一个NumPy数组,创建了一个具有strategy ='mean'的Imputer对象,并对NumPy数组执行了fit_transform().

I have made a NumPy array, created an Imputer object with strategy='mean' and performed fit_transform() on the NumPy array.

当我在执行fit_transform()之后打印数组时,"Nan"仍然存在,并且我没有得到任何预测.

When I print the array after performing fit_transform(), the 'Nan's remain, and I dont get any prediction.

我在这里做错了什么?我该如何预测缺失值?

What am I doing wrong here? How do I go about predicting the missing values?

import numpy as np
from sklearn.preprocessing import Imputer

X = np.array([[23.56],[53.45],['NaN'],[44.44],[77.78],['NaN'],[234.44],[11.33],[79.87]])

print X

imp = Imputer(missing_values='NaN', strategy='mean', axis=0)
imp.fit_transform(X)

print X

推荐答案

Per the documentation, sklearn.preprocessing.Imputer.fit_transform returns a new array, it doesn't alter the argument array. The minimal fix is therefore:

X = imp.fit_transform(X)

这篇关于使用scikit-learn的Imputer模块预测缺失值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆