pandas dataframe删除低频行 [英] pandas dataframe delete rows with low frequency

查看:150
本文介绍了pandas dataframe删除低频行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

删除具有一列具有较低频率值的所有行的最佳实践是什么?

What is the best practice to remove all rows that has a column with low frequency value?

数据框:

IN:
foo bar poo
1   a   A
2   a   A
3   a   B
4   b   B
5   b   A
6   b   A
7   c   C
8   d   B
9   e   B

示例1: 删除"poo"列中频率值小于3的所有行:

Example 1: Remove all rows that have less than 3 in frequency value in column 'poo':

OUT:
foo bar poo
1   a   A
2   a   A
3   a   B
4   b   B
5   b   A
6   b   A
8   d   B
9   e   B

示例2: 删除栏"列中频率值小于3的所有行:

Example 2: Remove all rows that have less than 3 in frequency value in column 'bar':

OUT:
foo bar poo
1   a   A
2   a   A
3   a   B
4   b   B
5   b   A
6   b   A

推荐答案

这应该很容易推广.您需要groupby + transform + count,然后过滤结果:

This should generalise pretty easily. You'll need groupby + transform + count, and then filter the result:

col = 'poo'  # 'bar'
n = 3        # 2

df[df.groupby(col)[col].transform('count').ge(n)]

   foo bar poo
0    1   a   A
1    2   a   A
2    3   a   B
3    4   b   B
4    5   b   A
5    6   b   A
7    8   d   B
8    9   e   B

这篇关于pandas dataframe删除低频行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆