Pandas DataFrame-将NULL字符串替换为空白,并将NULL数值替换为0 [英] Pandas DataFrame - Replace NULL String with Blank and NULL Numeric with 0

查看:4073
本文介绍了Pandas DataFrame-将NULL字符串替换为空白,并将NULL数值替换为0的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个大型数据集,其中包含许多不同类型的列.数字值和带有一些NULL值的字符串混合在一起.我需要根据类型将NULL值更改为Blank或0.

I am working on a large dataset with many columns of different types. There are a mix of numeric values and strings with some NULL values. I need to change the NULL Value to Blank or 0 depending on the type.

1   John   2    Doe   3   Mike   4    Orange   5   Stuff
9   NULL   NULL NULL  8   NULL   NULL Lemon    12  NULL

我希望它看起来像这样

1   John   2    Doe   3   Mike   4    Orange   5   Stuff
9          0          8          0    Lemon    12  

我可以为每个人执行此操作,但是由于我要提取几百个具有列的超大型数据集,因此我想采用其他方法.

I can do this for each individual, but since I am going to be pulling several extremely large datasets with hundreds of columns, I'd like to do this some other way.

来自较小数据集的类型,

Types from Smaller Dataset,

Field1              object
Field2              object
Field3              object
Field4              object
Field5              object
Field6              object
Field7              object
Field8              object
Field9              object
Field10              float64
Field11              float64
Field12              float64
Field13              float64
Field14              float64
Field15              object
Field16              float64
Field17              object
Field18              object
Field19              float64
Field20              float64
Field21              int64

推荐答案

使用 DataFrame.select_dtypes 用于数字列,按子集过滤并将值替换为0,然后将所有其他列重新放置为空字符串:

Use DataFrame.select_dtypes for numeric columns, filter by subset and replace values to 0, then repalce all another columns to empty string:

print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9   NaN  NaN  NaN  8   NaN  NaN   Lemon  12    NaN

print (df.dtypes)
0      int64
1     object
2    float64
3     object
4      int64
5     object
6    float64
7     object
8      int64
9     object
dtype: object

c = df.select_dtypes(np.number).columns
df[c] = df[c].fillna(0)
df = df.fillna("")
print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9        0.0       8        0.0   Lemon  12       

另一种解决方案是创建替换字典:

Another solution is create dictionary for replace:

num_cols = df.select_dtypes(np.number).columns
d1 = dict.fromkeys(num_cols, 0)
d2 = dict.fromkeys(df.columns.difference(num_cols), "")

d  = {**d1,  **d2}
print (d)
{0: 0, 2: 0, 4: 0, 6: 0, 8: 0, 1: '', 3: '', 5: '', 7: '', 9: ''}

df = df.fillna(d)
print (df)
   0     1    2    3  4     5    6       7   8      9
0  1  John  2.0  Doe  3  Mike  4.0  Orange   5  Stuff
1  9        0.0       8        0.0   Lemon  12       

这篇关于Pandas DataFrame-将NULL字符串替换为空白,并将NULL数值替换为0的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆