pandas dataframe删除低频行 [英] pandas dataframe delete rows with low frequency
本文介绍了pandas dataframe删除低频行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
删除具有一列具有较低频率值的所有行的最佳实践是什么?
What is the best practice to remove all rows that has a column with low frequency value?
数据框:
IN:
foo bar poo
1 a A
2 a A
3 a B
4 b B
5 b A
6 b A
7 c C
8 d B
9 e B
示例1: 删除"poo"列中频率值小于3的所有行:
Example 1: Remove all rows that have less than 3 in frequency value in column 'poo':
OUT:
foo bar poo
1 a A
2 a A
3 a B
4 b B
5 b A
6 b A
8 d B
9 e B
示例2: 删除栏"列中频率值小于3的所有行:
Example 2: Remove all rows that have less than 3 in frequency value in column 'bar':
OUT:
foo bar poo
1 a A
2 a A
3 a B
4 b B
5 b A
6 b A
推荐答案
这应该很容易推广.您需要groupby
+ transform
+ count
,然后过滤结果:
This should generalise pretty easily. You'll need groupby
+ transform
+ count
, and then filter the result:
col = 'poo' # 'bar'
n = 3 # 2
df[df.groupby(col)[col].transform('count').ge(n)]
foo bar poo
0 1 a A
1 2 a A
2 3 a B
3 4 b B
4 5 b A
5 6 b A
7 8 d B
8 9 e B
这篇关于pandas dataframe删除低频行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文