Python:在列表中查找异常值 [英] Python: Find outliers inside a list

查看:123
本文介绍了Python:在列表中查找异常值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含随机数量的整数和/或浮点数的列表.我想要实现的是在我的数字中找到异常(希望用正确的词来解释这一点).例如:

I'm having a list with a random amount of integers and/or floats. What I'm trying to achieve is to find the exceptions inside my numbers (hoping to use the right words to explain this). For example:

list = [1, 3, 2, 14, 108, 2, 1, 8, 97, 1, 4, 3, 5]

  • 90% 到 99% 的整数值在 1 到 20 之间
  • 有时会有更高的值,比如大约 100 或 1.000 甚至更多
  • 我的问题是,这些值可能一直不同.也许常规范围在 1.000 到 1.200 之间,而例外范围在 50 万左右.

    My problem is, that these values can be different all the time. Maybe the regular range is somewhere between 1.000 to 1.200 and the exceptions are in the range of half a million.

    有没有过滤掉这些特殊数字的功能?

    Is there a function to filter out these special numbers?

    推荐答案

    假设您的列表是 l:

    • 如果您知道要过滤某个百分位数/分位数,您可以使用:

    • If you know you want to filter a certain percentile/quantile, you can use:

    这将删除底部 10% 和顶部 90%.当然,你可以改变任何一个将它们设置为您想要的截止值(例如,您可以移除底部过滤器并仅过滤示例中的前 90%):

    This removes bottom 10% and top 90%. Of course, you can change any of them to your desired cut-off (for example you can remove the bottom filter and only filter the top 90% in your example):

    import numpy as np
    l = np.array(l)
    l = l[(l>np.quantile(l,0.1)) & (l<np.quantile(l,0.9))].tolist()
    

    输出:

    [ 3  2 14  2  8  4  3  5]
    

  • 如果您不确定百分位截止值并且正在寻找删除异常值:

  • If you are not sure of the percentile cut-off and are looking to remove outliers:

    您可以通过调整参数 m 来调整异常值的截止值函数调用.它越大,删除的异常值越少.与其他异常值去除技术相比,此函数似乎对各种类型的异常值更稳健.

    You can adjust your cut-off for outliers by adjusting argument m in function call. The larger it is, the less outliers are removed. This function seems to be more robust to various types of outliers compared to other outlier removal techniques.

     import numpy as np 
     l = np.array(l) 
     def reject_outliers(data, m=6.):
        d = np.abs(data - np.median(data))
        mdev = np.median(d)
        s = d / (mdev if mdev else 1.)
        return data[s < m].tolist()
     print(reject_outliers(l))
    

    输出:

    [1, 3, 2, 14, 2, 1, 8, 1, 4, 3, 5]
    

  • 这篇关于Python:在列表中查找异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆