如何制作DataFrame和“ fillna”切片在特定切片中使用Python Pandas? [英] How to make a slice of DataFrame and "fillna" in specific slice using Python Pandas?

查看:108
本文介绍了如何制作DataFrame和“ fillna”切片在特定切片中使用Python Pandas?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:让我们从Kaggle获取Titanic数据集。
我有带有 Pclass, Sex和 Age列的数据框。
我需要使用某些组的中位数填充年龄列中的NaN。
如果是一等舱的女性,我想用一等舱女性的中位数而不是整个年龄列的中位数来填充她的年龄。

The problem: let us take Titanic dataset from Kaggle. I have dataframe with columns "Pclass", "Sex" and "Age". I need to fill NaN in column "Age" with a median for certain group. If it is a woman from 1st class, I would like to fill her age with the median for 1st class women, not with the median for whole Age column.

问题是如何在特定范围内进行更改?

The question is how to make this change in a certain slice?

我尝试过:

data['Age'][(data['Sex'] == 'female')&(data['Pclass'] == 1)&(data['Age'].isnull())].fillna(median)

其中 median是我的值,但没有更改 inplace = True没有帮助。

where the "median" is my value, but nothing changes "inplace=True" didn't help.

非常感谢!

推荐答案

我相信您需要按掩码过滤并分配回去:

I believe you need filter by masks and assign back:

data = pd.DataFrame({'a':list('aaaddd'),
                     'Sex':['female','female','male','female','female','male'],
                     'Pclass':[1,2,1,2,1,1],
                     'Age':[40,20,30,20,np.nan,np.nan]})

print (data)
    Age  Pclass     Sex  a
0  40.0       1  female  a
1  20.0       2  female  a
2  30.0       1    male  a
3  20.0       2  female  d
4   NaN       1  female  d
5   NaN       1    male  d

#boolean mask
mask1 = (data['Sex'] == 'female')&(data['Pclass'] == 1)

#get median by mask without NaNs
med = data.loc[mask1, 'Age'].median()
print (med)
40.0

#repalce NaNs
data.loc[mask1, 'Age'] = data.loc[mask1, 'Age'].fillna(med)
print (data)
    Age  Pclass     Sex  a
0  40.0       1  female  a
1  20.0       2  female  a
2  30.0       1    male  a
3  20.0       2  female  d
4  40.0       1  female  d
5   NaN       1    male  d

什么意思:

mask2 = mask1 &(data['Age'].isnull())

data.loc[mask2, 'Age'] = med
print (data)
    Age  Pclass     Sex  a
0  40.0       1  female  a
1  20.0       2  female  a
2  30.0       1    male  a
3  20.0       2  female  d
4  40.0       1  female  d
5   NaN       1    male  d

编辑:

如果需要,用中位数替换所有组 NaN s:

If need replace all groups NaNs by median:

data['Age'] = data.groupby(["Sex","Pclass"])["Age"].apply(lambda x: x.fillna(x.median()))
print (data)

    Age  Pclass     Sex  a
0  40.0       1  female  a
1  20.0       2  female  a
2  30.0       1    male  a
3  20.0       2  female  d
4  40.0       1  female  d
5  30.0       1    male  d

这篇关于如何制作DataFrame和“ fillna”切片在特定切片中使用Python Pandas?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆