Python:在列表中查找异常值 [英] Python: Find outliers inside a list
问题描述
我有一个包含随机数量的整数和/或浮点数的列表.我想要实现的是在我的数字中找到异常(希望用正确的词来解释这一点).例如:
I'm having a list with a random amount of integers and/or floats. What I'm trying to achieve is to find the exceptions inside my numbers (hoping to use the right words to explain this). For example:
list = [1, 3, 2, 14, 108, 2, 1, 8, 97, 1, 4, 3, 5]
- 90% 到 99% 的整数值在 1 到 20 之间
- 有时会有更高的值,比如大约 100 或 1.000 甚至更多
如果您知道要过滤某个百分位数/分位数,您可以使用:
If you know you want to filter a certain percentile/quantile, you can use:
我的问题是,这些值可能一直不同.也许常规范围在 1.000 到 1.200 之间,而例外范围在 50 万左右.
My problem is, that these values can be different all the time. Maybe the regular range is somewhere between 1.000 to 1.200 and the exceptions are in the range of half a million.
有没有过滤掉这些特殊数字的功能?
Is there a function to filter out these special numbers?
推荐答案
假设您的列表是 l
:
这将删除底部 10% 和顶部 90%.当然,你可以改变任何一个将它们设置为您想要的截止值(例如,您可以移除底部过滤器并仅过滤示例中的前 90%):
This removes bottom 10% and top 90%. Of course, you can change any of them to your desired cut-off (for example you can remove the bottom filter and only filter the top 90% in your example):
import numpy as np
l = np.array(l)
l = l[(l>np.quantile(l,0.1)) & (l<np.quantile(l,0.9))].tolist()
输出:
[ 3 2 14 2 8 4 3 5]
如果您不确定百分位截止值并且正在寻找删除异常值:
If you are not sure of the percentile cut-off and are looking to remove outliers:
您可以通过调整参数 m
来调整异常值的截止值函数调用.它越大,删除的异常值越少.与其他异常值去除技术相比,此函数似乎对各种类型的异常值更稳健.
You can adjust your cut-off for outliers by adjusting argument m
in
function call. The larger it is, the less outliers are removed. This function seems to be more robust to various types of outliers compared to other outlier removal techniques.
import numpy as np
l = np.array(l)
def reject_outliers(data, m=6.):
d = np.abs(data - np.median(data))
mdev = np.median(d)
s = d / (mdev if mdev else 1.)
return data[s < m].tolist()
print(reject_outliers(l))
输出:
[1, 3, 2, 14, 2, 1, 8, 1, 4, 3, 5]
这篇关于Python:在列表中查找异常值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!