如何使用与其他两列匹配的python填充数据集中的空值? [英] How to fill null values in a Dataset using python that matches with two other columns?

查看:167
本文介绍了如何使用与其他两列匹配的python填充数据集中的空值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个巨大的数据集.它具有属性,我在努力工作 1.年龄 2.Embark(从那里登上港口的旅客.共有3个港口:S,Q和C) 3.Survived(0表示没有幸存,1表示没有幸存)

I have a titanic Dataset. It has attributes and i was working manly on 1.Age 2.Embark ( from which port passengers embarked..There are total 3 ports..S,Q and C) 3.Survived ( 0 for did not survived,1 for survived)

我正在过滤无用的数据.然后,我需要填写Age中存在的Null值.因此,我计算了每个登机区中幸存和未幸存的乘客数量,即S,Q和C

I was filtering the useless data. Then i needed to fill Null values present in Age. So i counted how many passengers survived and didn't survived in each Embark i.e. S,Q and C

我找出从每个S,Q和C港口出发后幸存和未幸存的乘客的平均年龄.但是现在我不知道如何在原始的《泰坦尼克号》年龄列中填充这6个值(对于每个S,Q和C来说是3个,对于每个S,Q和C来说都没有幸存的3个……总共6个) .如果我只是简单地执行titanic.Age.fillna('使用六个值之一'),它将使用我不希望的那个值填充Age的所有Null值.

I find out the mean age of Passengers who survived and who did not survived after embarking from each S,Q and C port. But now i have no idea how to fill these 6 values ( 3 for survived from each S,Q and C and 3 for who did not survived from each S,Q and C...So total 6) in the original titanic Age column. If i do simply titanic.Age.fillna('With one of the six values') it will fill All the Null values of Age with that one value which i don't want.

给了一些时间后,我尝试了一下.

After giving some time,i tried this.

titanic[titanic.Survived==1][titanic.Embarked=='S'].Age.fillna(SurvivedS.Age.mean(),inplace=True)
titanic[titanic.Survived==1][titanic.Embarked=='Q'].Age.fillna(SurvivedQ.Age.mean(),inplace=True)
titanic[titanic.Survived==1][titanic.Embarked=='C'].Age.fillna(SurvivedC.Age.mean(),inplace=True)
titanic[titanic.Survived==0][titanic.Embarked=='S'].Age.fillna(DidntSurvivedS.Age.mean(),inplace=True)
titanic[titanic.Survived==0][titanic.Embarked=='Q'].Age.fillna(DidntSurvivedQ.Age.mean(),inplace=True)
titanic[titanic.Survived==0][titanic.Embarked=='C'].Age.fillna(DidntSurvivedC.Age.mean(),inplace=True)

这没有显示任何错误,但仍然无法正常工作.知道我该怎么办吗?

This showed no error but still it doesn't work. Any idea what should i do?

推荐答案

我认为您需要 mean :

I think you need groupby with apply with fillna by mean:

titanic['age'] = titanic.groupby(['survived','embarked'])['age']
                        .apply(lambda x: x.fillna(x.mean()))


import seaborn as sns

titanic = sns.load_dataset('titanic')
#check NaN rows in age
print (titanic[titanic['age'].isnull()].head(10))
    survived  pclass     sex  age  sibsp  parch      fare embarked   class  \
5          0       3    male  NaN      0      0    8.4583        Q   Third   
17         1       2    male  NaN      0      0   13.0000        S  Second   
19         1       3  female  NaN      0      0    7.2250        C   Third   
26         0       3    male  NaN      0      0    7.2250        C   Third   
28         1       3  female  NaN      0      0    7.8792        Q   Third   
29         0       3    male  NaN      0      0    7.8958        S   Third   
31         1       1  female  NaN      1      0  146.5208        C   First   
32         1       3  female  NaN      0      0    7.7500        Q   Third   
36         1       3    male  NaN      0      0    7.2292        C   Third   
42         0       3    male  NaN      0      0    7.8958        C   Third   

      who  adult_male deck  embark_town alive  alone  
5     man        True  NaN   Queenstown    no   True  
17    man        True  NaN  Southampton   yes   True  
19  woman       False  NaN    Cherbourg   yes   True  
26    man        True  NaN    Cherbourg    no   True  
28  woman       False  NaN   Queenstown   yes   True  
29    man        True  NaN  Southampton    no   True  
31  woman       False    B    Cherbourg   yes  False  
32  woman       False  NaN   Queenstown   yes   True  
36    man        True  NaN    Cherbourg   yes   True  
42    man        True  NaN    Cherbourg    no   True 


idx = titanic[titanic['age'].isnull()].index
titanic['age'] = titanic.groupby(['survived','embarked'])['age']
                        .apply(lambda x: x.fillna(x.mean()))

#check if values was replaced
print (titanic.loc[idx].head(10))
    survived  pclass     sex        age  sibsp  parch      fare embarked  \
5          0       3    male  30.325000      0      0    8.4583        Q   
17         1       2    male  28.113184      0      0   13.0000        S   
19         1       3  female  28.973671      0      0    7.2250        C   
26         0       3    male  33.666667      0      0    7.2250        C   
28         1       3  female  22.500000      0      0    7.8792        Q   
29         0       3    male  30.203966      0      0    7.8958        S   
31         1       1  female  28.973671      1      0  146.5208        C   
32         1       3  female  22.500000      0      0    7.7500        Q   
36         1       3    male  28.973671      0      0    7.2292        C   
42         0       3    male  33.666667      0      0    7.8958        C   

     class    who  adult_male deck  embark_town alive  alone  
5    Third    man        True  NaN   Queenstown    no   True  
17  Second    man        True  NaN  Southampton   yes   True  
19   Third  woman       False  NaN    Cherbourg   yes   True  
26   Third    man        True  NaN    Cherbourg    no   True  
28   Third  woman       False  NaN   Queenstown   yes   True  
29   Third    man        True  NaN  Southampton    no   True  
31   First  woman       False    B    Cherbourg   yes  False  
32   Third  woman       False  NaN   Queenstown   yes   True  
36   Third    man        True  NaN    Cherbourg   yes   True  
42   Third    man        True  NaN    Cherbourg    no   True  


#check mean values
print (titanic.groupby(['survived','embarked'])['age'].mean())
survived  embarked
0         C           33.666667
          Q           30.325000
          S           30.203966
1         C           28.973671
          Q           22.500000
          S           28.113184
Name: age, dtype: float64

这篇关于如何使用与其他两列匹配的python填充数据集中的空值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆