在其他列满足特定条件的情况下,如何替换NaN值? [英] How to replace NaN values where the other columns meet a certain criteria?
问题描述
我正在研究Kaggle的泰坦尼克号数据集,并尝试根据另一列中的信息替换NaN值.
I am working on the titanic datset from Kaggle and am trying to replace the NaN values in one column based on information from the other columns.
在我的具体示例中,我试图用男性一等舱乘客的平均年龄来代替未知的男性一等舱乘客的年龄.
In my specific example I am trying to replace the unknown age of male, 1st class passengers with the average age of male, 1st class passengers.
我该怎么做?
我已经能够分割数据并替换该新数据帧的空值,但是它不会延续到原始数据帧,因此我不清楚如何使它这样做.
I have been able to segment the data and replace the null values of that new dataframe, but it doesn't carry over to the original dataframe and I am a bit unclear on how to make it do so.
这是我的代码:
missingage_1stclass_male = pd.DataFrame(
titanic[
(titanic['Age'].isnull()) &
(titanic['Pclass'] == 1) &
(titanic['Sex'] == 'male')
]
)
missingage_1stclass_male.Age.fillna(40.5, inplace=True)
我所有值的原始数据框都称为titanic.
My original dataframe with all the values is named titanic.
推荐答案
我正在尝试取代男性的头等舱乘客的未知年龄 男性一等舱乘客的平均年龄.
I am trying to replace the unknown age of male, 1st class passengers with the average age of male, 1st class passengers.
您可以将问题分为两个步骤.首先计算男性一等舱乘客的平均年龄:
You can split the problem into 2 steps. First calculate the average age of male, 1st class passengers:
mask = (df['Pclass'] == 1) & (df['Sex'] == 'male')
avg_filler = df.loc[mask, 'Age'].mean()
然后更新满足您条件的值:
Then update values satisfying your criteria:
df.loc[df['Age'].isnull() & mask, 'Age'] = avg_filler
这篇关于在其他列满足特定条件的情况下,如何替换NaN值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!