如何在IQR中使用 pandas 滤镜? [英] how to use pandas filter with IQR?

查看:123
本文介绍了如何在IQR中使用 pandas 滤镜?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在通过IQR(即Q1-1.5IQR和Q3 + 1.5IQR之间的值)对列进行过滤的内置方法? 另外,建议对熊猫提出的任何其他可能的通用过滤条件也将表示赞赏.

Is there a built-in way to do filtering on a column by IQR(i.e. values between Q1-1.5IQR and Q3+1.5IQR)? also, any other possible generalized filtering in pandas suggested will be appreciated.

推荐答案

据我所知,最紧凑的表示法似乎是query方法带来的.

As far as I know, the most compact notation seems to be brought by the query method.

# Some test data
np.random.seed(33454)
df = (
    # A standard distribution
    pd.DataFrame({'nb': np.random.randint(0, 100, 20)})
        # Adding some outliers
        .append(pd.DataFrame({'nb': np.random.randint(100, 200, 2)}))
        # Reseting the index
        .reset_index(drop=True)
    )

# Computing IQR
Q1 = df['nb'].quantile(0.25)
Q3 = df['nb'].quantile(0.75)
IQR = Q3 - Q1

# Filtering Values between Q1-1.5IQR and Q3+1.5IQR
filtered = df.query('(@Q1 - 1.5 * @IQR) <= nb <= (@Q3 + 1.5 * @IQR)')

然后我们可以绘制结果以检查差异.我们观察到左侧框图中的异常值(183处的叉号)不再出现​​在过滤后的序列中.

Then we can plot the result to check the difference. We observe that the outlier in the left boxplot (the cross at 183) does not appear anymore in the filtered series.

# Ploting the result to check the difference
df.join(filtered, rsuffix='_filtered').boxplot()

由于这个答案,我为此主题写了一个帖子您可能会找到更多信息.

Since this answer I've written a post on this topic were you may find more information.

这篇关于如何在IQR中使用 pandas 滤镜?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆