检测并排除 Pandas DataFrame 中的异常值 [英] Detect and exclude outliers in a pandas DataFrame

查看：27 发布时间：2021/12/3 8:28:05 python pandas filtering dataframe outliers

本文介绍了检测并排除 Pandas DataFrame 中的异常值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含几列的 Pandas 数据框.

I have a pandas data frame with few columns.

现在我知道某些行是基于某个列值的异常值.

Now I know that certain rows are outliers based on a certain column value.

例如

Vol"列包含 12xx 附近的所有值，其中一个值为 4000(异常值).

column 'Vol' has all values around 12xx and one value is 4000 (outlier).

现在我想排除那些像这样具有 Vol 列的行.

Now I would like to exclude those rows that have Vol column like this.

所以，本质上我需要在数据框上放置一个过滤器，以便我们选择特定列的值在平均值的 3 个标准偏差内的所有行.

So, essentially I need to put a filter on the data frame such that we select all rows where the values of a certain column are within, say, 3 standard deviations from mean.

实现这一目标的优雅方式是什么?

What is an elegant way to achieve this?

推荐答案

如果您的数据框中有多个列，并且想要删除所有在至少一列中具有异常值的行，则以下表达式将执行此操作一枪搞定.

df = pd.DataFrame(np.random.randn(100, 3))

from scipy import stats
df[(np.abs(stats.zscore(df)) < 3).all(axis=1)]

说明:

对于每一列，它首先计算列中每个值的 Z-score列，相对于列均值和标准差.
然后取绝对 Z 分数，因为方向不重要的是，只有当它低于阈值时.
all(axis=1) 确保对于每一行，所有列都满足约束.
最后，此条件的结果用于索引数据帧.

为zscore 指定一列，例如df[0]，并删除.all(axis=1).

Specify a column for the zscore, df[0] for example, and remove .all(axis=1).

df[(np.abs(stats.zscore(df[0])) < 3)]

这篇关于检测并排除 Pandas DataFrame 中的异常值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

检测并排除 Pandas DataFrame 中的异常值 [英] Detect and exclude outliers in a pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

检测并排除 Pandas DataFrame 中的异常值 [英] Detect and exclude outliers in a pandas DataFrame

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭