在数据集中寻找异常值 [英] Finding outliers in a data set

查看：35 发布时间：2021/8/30 18:43:23 python statistics

本文介绍了在数据集中寻找异常值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个 Python 脚本，用于创建服务器正常运行时间和性能数据列表的列表，其中每个子列表(或行")包含特定集群的统计信息.例如，很好地格式化它看起来像这样:

I have a python script that creates a list of lists of server uptime and performance data, where each sub-list (or 'row') contains a particular cluster's stats. For example, nicely formatted it looks something like this:

-------  -------------  ------------  ----------  -------------------
Cluster  %Availability  Requests/Sec  Errors/Sec  %Memory_Utilization
-------  -------------  ------------  ----------  -------------------
ams-a    98.099          1012         678          91
bos-a    98.099          1111         12           91
bos-b    55.123          1513         576          22
lax-a    99.110          988          10           89
pdx-a    98.123          1121         11           90
ord-b    75.005          1301         123          100
sjc-a    99.020          1000         10           88
...(so on)...

所以在列表形式中，它可能看起来像:

So in list form, it might look like:

[[ams-a,98.099,1012,678,91],[bos-a,98.099,1111,12,91],...]

我的问题:确定每列中异常值的最佳方法是什么?或者异常值不一定是解决发现不良"问题的最佳方法?在上面的数据中，我肯定想知道bos-b和ord-b以及ams-a，因为它的错误率很高，但其他的可以丢弃.取决于列，因为更高不一定更糟，也不一定更低，我试图找出最有效的方法来做到这一点.似乎 numpy 在这类东西中被提及很多，但不知道从哪里开始(遗憾的是，我更像是系统管理员而不是统计学家......).

My question: What's the best way to determine the outliers in each column? Or are outliers not necessarily the best way to attack the problem of finding 'badness'? In the data above, I'd definitely want to know about bos-b and ord-b, as well as ams-a since it's error rate is so high, but the others can be discarded. Depending on the column, since higher is not necessarily worse, nor is lower, I'm trying to figure out the most efficient way to do this. Seems like numpy gets mentioned a lot for this sort of stuff, but not sure where to even start with it (sadly, I'm more sysadmin than statistician...).

提前致谢！

在数据集中寻找异常值 [英] Finding outliers in a data set

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在数据集中寻找异常值 [英] Finding outliers in a data set

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭