有功能可以消除异常值吗? [英] Is there function that can remove the outliers?

查看:81
本文介绍了有功能可以消除异常值吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我找到了一个从列中检测离群值的函数,但是我不知道如何删除离群值

I find a function to detect outliers from columns but I do not know how to remove the outliers

是否具有从列中排除或删除异常值的功能

is there a function for excluding or removing outliers from the columns

这里是检测异常值的功能,但我需要一个功能来消除异常值

Here is the function to detect the outlier but I need help in a function to remove the outliers

import numpy as np
import pandas as pd
outliers=[]
def detect_outlier(data_1):

    threshold=3
    mean_1 = np.mean(data_1)
    std_1 =np.std(data_1)


    for y in data_1:
        z_score= (y - mean_1)/std_1 
        if np.abs(z_score) > threshold:
            outliers.append(y)
    return outliers

这是打印异常值

#printing the outlier 
outlier_datapoints = detect_outlier(df['Pre_TOTAL_PURCHASE_ADJ'])
print(outlier_datapoints)

推荐答案

一个简单的解决方案是使用

An easy solution would be to use scipy.stats.zscore

from scipy.stats import zscore
# calculates z-score values
df["zscore"] = zscore(df["Pre_TOTAL_PURCHASE_ADJ"]) 

# creates `is_outlier` column with either True or False values, 
# so that you could filter your dataframe accordingly
df["is_outlier"] = df["zscore"].apply(lambda x: x <= -1.96 or x >= 1.96)

这篇关于有功能可以消除异常值吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆