在每个组的数据框中标记异常值 [英] Flag outliers in the dataframe for each group

查看：26 发布时间：2021/11/16 23:17:17 python pandas apply pandas-groupby

本文介绍了在每个组的数据框中标记异常值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想为数据帧中的每组值识别异常值，并返回一个数据帧，其中一列包含数据帧每一行的 True/False.

I would like to identify outliers for each group of values within a dataframe and return a dataframe with a column containing True/False for each row of the dataframe.

data = {'Group':['A', 'A', 'A', 'B', 'B', 'B'], 'Age':[20, 21, 19, 18, 2, 17]} 
df = pd.DataFrame(data) 

def flag_outlier(x):
    lower_limit  = np.mean(x) - np.std(x) * 3 
    upper_limit = np.mean(x) + np.std(x) * 3
    for i in x:
        if i > upper_limit or i < lower_limit:
            return True
df['Flag'] = df.groupby('Group')['Age'].apply(flag_outlier)

此代码返回一列包含 NaN，此函数如何修复?

This code return a column with NaN, how can this function be fixed?

这篇文章将函数应用于 groupby 函数类似，但我想不通出来.

This post apply a function to a groupby function is similar, but I cannot figure out.

非常感谢，

推荐答案

把你的函数改成如下，

def flag_outlier(x):
    lower_limit  = np.mean(x) - np.std(x) * 3 
    upper_limit = np.mean(x) + np.std(x) * 3
    return (x>upper_limit)| (x<lower_limit)

因为你的处理方式，你的函数每组只返回一个值

because the way you are going about it, your function returns just one value per group

这篇关于在每个组的数据框中标记异常值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在每个组的数据框中标记异常值 [英] Flag outliers in the dataframe for each group

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在每个组的数据框中标记异常值 [英] Flag outliers in the dataframe for each group

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭