有功能可以消除异常值吗? [英] Is there function that can remove the outliers?
本文介绍了有功能可以消除异常值吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我找到了一个从列中检测离群值的函数,但是我不知道如何删除离群值
I find a function to detect outliers from columns but I do not know how to remove the outliers
是否具有从列中排除或删除异常值的功能
is there a function for excluding or removing outliers from the columns
这里是检测异常值的功能,但我需要一个功能来消除异常值
Here is the function to detect the outlier but I need help in a function to remove the outliers
import numpy as np
import pandas as pd
outliers=[]
def detect_outlier(data_1):
threshold=3
mean_1 = np.mean(data_1)
std_1 =np.std(data_1)
for y in data_1:
z_score= (y - mean_1)/std_1
if np.abs(z_score) > threshold:
outliers.append(y)
return outliers
这是打印异常值
#printing the outlier
outlier_datapoints = detect_outlier(df['Pre_TOTAL_PURCHASE_ADJ'])
print(outlier_datapoints)
推荐答案
An easy solution would be to use scipy.stats.zscore
from scipy.stats import zscore
# calculates z-score values
df["zscore"] = zscore(df["Pre_TOTAL_PURCHASE_ADJ"])
# creates `is_outlier` column with either True or False values,
# so that you could filter your dataframe accordingly
df["is_outlier"] = df["zscore"].apply(lambda x: x <= -1.96 or x >= 1.96)
这篇关于有功能可以消除异常值吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文