用fancyimpute和pandas进行数据归类 [英] Data imputation with fancyimpute and pandas

查看:146
本文介绍了用fancyimpute和pandas进行数据归类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的大熊猫数据享誉全球df.它有很多缺失.不能逐行或逐行删除.插补中位数,均值或最频繁的值也不是一种选择(因此,不幸的是,使用pandas和/或scikit进行插补并不能解决问题).

I have a large pandas data fame df. It has quite a few missings. Dropping row/or col-wise is not an option. Imputing medians, means or the most frequent values is not an option either (hence imputation with pandas and/or scikit unfortunately doens't do the trick).

我偶然发现了一个名为fancyimpute的简洁软件包(您可以在此处).但是我有一些问题.

I came across what seems to be a neat package called fancyimpute (you can find it here). But I have some problems with it.

这是我的工作:

#the neccesary imports
import pandas as pd
import numpy as np
from fancyimpute import KNN

# df is my data frame with the missings. I keep only floats
df_numeric = = df.select_dtypes(include=[np.float])

# I now run fancyimpute KNN, 
# it returns a np.array which I store as a pandas dataframe
df_filled = pd.DataFrame(KNN(3).complete(df_numeric))

但是,df_filled某种程度上是单个向量,而不是填充的数据帧.如何获得带有插补的数据框?

However, df_filled is a single vector somehow, instead of the filled data frame. How do I get a hold of the data frame with imputations?

我意识到,fancyimpute需要一个numpay array.因此,我将df_numeric转换为使用as_matrix()的数组.

I realized, fancyimpute needs a numpay array. I hence converted the df_numeric to a an array using as_matrix().

# df is my data frame with the missings. I keep only floats
df_numeric = df.select_dtypes(include=[np.float]).as_matrix()

# I now run fancyimpute KNN, 
# it returns a np.array which I store as a pandas dataframe
df_filled = pd.DataFrame(KNN(3).complete(df_numeric))

输出是缺少列标签的数据框.有什么方法可以检索标签吗?

The output is a dataframe with the column labels gone missing. Any way to retrieve the labels?

推荐答案

df=pd.DataFrame(data=mice.complete(d), columns=d.columns, index=d.index)

fancyimpute对象(无论是鼠标还是KNN)的.complete()方法返回的np.array作为其cols和index与原始数据相同的pandas数据帧的内容(argument data=)进给.框架.

The np.array that is returned by the .complete() method of the fancyimpute object (be it mice or KNN) is fed as the content (argument data=) of a pandas dataframe whose cols and indexes are the same as the original data frame.

这篇关于用fancyimpute和pandas进行数据归类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆