Pandas:基于列Dtype的常规数据插补 [英] Pandas: General Data Imputation Based on Column Dtype

查看：63 发布时间：2020/5/24 3:54:55 python pandas

本文介绍了Pandas:基于列Dtype的常规数据插补的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在处理具有约80列的数据集，其中许多包含NaN.我绝对不想手动检查每列的dtype并以此为依据进行估算.

I'm working with a dataset with ~80 columns, many of which contain NaN. I definitely don't want to manually inspect dtype for each column and impute based on that.

所以我写了一个函数来根据列的dtype估算列的缺失值:

So I wrote a function to impute a column's missing values based on its dtype:

def impute_df(df, col):
    # if col is float, impute mean
    if df[col].dtype == "int64":
        df[col].fillna(df[col].mean(), inplace=True)
    else:
        df[col].fillna(df[col].mode()[0], inplace=True)

但是要使用此功能，我必须遍历DataFrame中的所有列，例如:

But to use this, I'd have to loop over all columns in my DataFrame, something like:

for col in train_df.columns:
    impute_df(train_df, col)

而且我知道在Pandas中循环通常很慢.有没有更好的方法来解决这个问题?

And I know looping in Pandas is generally slow. Is there a better way of going about this?

谢谢！

推荐答案

我认为您需要

I think you need select_dtypes for numeric and non numeric columns and then apply fillna for filtered columns:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[np.nan,5,4,5,5,4],
                   'C':[7,8,np.nan,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':['a','a','b','b','b',np.nan]})

print (df)

   A    B    C  D  E    F
0  a  NaN  7.0  1  5    a
1  b  5.0  8.0  3  3    a
2  c  4.0  NaN  5  6    b
3  d  5.0  4.0  7  9    b
4  e  5.0  2.0  1  2    b
5  f  4.0  3.0  0  4  NaN

cols1 = df.select_dtypes([np.number]).columns
cols2 = df.select_dtypes(exclude = [np.number]).columns
df[cols1] = df[cols1].fillna(df[cols1].mean())
df[cols2] = df[cols2].fillna(df[cols2].mode().iloc[0])
print (df)
   A    B    C  D  E  F
0  a  4.6  7.0  1  5  a
1  b  5.0  8.0  3  3  a
2  c  4.0  4.8  5  6  b
3  d  5.0  4.0  7  9  b
4  e  5.0  2.0  1  2  b
5  f  4.0  3.0  0  4  b

这篇关于Pandas:基于列Dtype的常规数据插补的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Pandas:基于列Dtype的常规数据插补 [英] Pandas: General Data Imputation Based on Column Dtype

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Pandas:基于列Dtype的常规数据插补 [英] Pandas: General Data Imputation Based on Column Dtype

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭