Python:用中值替换异常值 [英] Python: replacing outliers values with median values
问题描述
我有一个 python 数据框,其中有一些异常值.如果这些值不存在,我想用数据的中值替换它们.
I have a python data-frame in which there are some outlier values. I would like to replace them with the median values of the data, had those values not been there.
id Age
10236 766105
11993 288
9337 205
38189 88
35555 82
39443 75
10762 74
33847 72
21194 70
39450 70
所以,我想用剩余数据集的数据集的中值,即70,70,72,74,75
的中值替换所有>75的值.
So, I want to replace all the values > 75 with the median value of the dataset of the remaining dataset, i.e., the median value of 70,70,72,74,75
.
我正在尝试执行以下操作:
I'm trying to do the following:
- 用0代替,所有大于75的值
- 用中值替换 0.
但不知何故,下面的代码不起作用
But somehow, the below code not working
df['age'].replace(df.age>75,0,inplace=True)
推荐答案
我想这就是你要找的,你可以用 loc 来赋值.然后就可以填nan
I think this is what you are looking for, you can use loc to assign value . Then you can fill the nan
median = df.loc[df['Age']<75, 'Age'].median()
df.loc[df.Age > 75, 'Age'] = np.nan
df.fillna(median,inplace=True)
你也可以在一行中使用 np.where
You can also use np.where in one line
df["Age"] = np.where(df["Age"] >75, median,df['Age'])
您也可以使用 .mask 即
You can also use .mask i.e
df["Age"] = df["Age"].mask(df["Age"] >75, median)
这篇关于Python:用中值替换异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!