了解python中DataFrame的执行 [英] Understanding the execution of DataFrame in python

查看:104
本文介绍了了解python中DataFrame的执行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的新手,我想了解如何在DataFrame中执行代码.让我们用kaggle.com(《泰坦尼克号:灾难中的机器学习》 )中的数据集中的示例进行尝试.我想用相应性别的均值()替换NaN值. IE.男士的NaN值应替换为男士年龄的平均值,反之亦然.现在我通过使用这一行代码实现了这一点

I am new to python and i want to understand how the execution takes place in a DataFrame. let's try this with an example from the dataset found in the kaggle.com(Titanic: Machine Learning from Disaster). I wanted to replace the NaN value with the mean() for the respective sex. ie. the NaN value for Men should be replaced by the mean of the mens age and vice versa. now i achieved this by using this line of code

_data['new_age']=_data['new_age'].fillna(_data.groupby('Sex')['Age'].transform('mean'))

我的问题是,在执行代码时,该行如何知道该特定行属于男性,而NaN值应替换为 male mean()而女性值应替换通过女性均值().

my question is, while executing the code, how does the line knows that this particular row belongs to male and the NaN value should be replaced by the male mean() and female value should be replaced by the female mean().

推荐答案

是因为 groupby + transform .当您对每个组返回一个标量的聚合进行分组时,对于每个唯一的分组键,正常的groupby会折叠为一行.

It's because of groupby + transform. When you group with an aggregation that returns a scalar per group a normal groupby collapses to a single row for each unique grouping key.

np.random.seed(42)
df = pd.DataFrame({'Sex': list('MFMMFFMMFM'),
                   'Age': np.random.choice([1, 10, 11, 13, np.NaN], 10)},
                   index=list('ABCDEFGHIJ'))
df.groupby('Sex')['Age'].mean()

#Sex
#F    10.5                # One F row
#M    11.5                # One M row
#Name: Age, dtype: float64

使用transform会将结果基于行所属的组广播回原始索引.

Using transform will broadcast this result back to the original index based on the group that row belonged to.

df.groupby('Sex')['Age'].transform('mean')

#A    11.5  # Belonged to M
#B    10.5  # Belonged to F
#C    11.5  # Belonged to M
#D    11.5
#E    10.5
#F    10.5
#G    11.5
#H    11.5
#I    10.5
#J    11.5
#Name: Age, dtype: float64

为清晰起见,我将转换后的结果分配回去,现在您可以看到.fillna如何获得正确的均值.

To make it crystal clear, I'll assign the transformed result back, and now you can see how .fillna gets the correct mean.

df['Sex_mean'] = df.groupby('Sex')['Age'].transform('mean')


  Sex   Age  Sex_mean
A   M  13.0      11.5
B   F   NaN      10.5  # NaN will be filled with 10.5
C   M  11.0      11.5
D   M   NaN      11.5  # NaN will be filled with 11.5
E   F   NaN      10.5  # Nan will be filled with 10.5
F   F  10.0      10.5
G   M  11.0      11.5
H   M  11.0      11.5
I   F  11.0      10.5
J   M   NaN      11.5  # Nan will be filled with 11.5

这篇关于了解python中DataFrame的执行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆