Python pandas :排除特定频率数以下的行 [英] Python pandas: exclude rows below a certain frequency count

查看:174
本文介绍了Python pandas :排除特定频率数以下的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一个像这样的熊猫DataFrame:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b 1.8 2
2.3 1
1.8 1
2.1 3
2.0 3
1.9 1
... ...

我想按位置过滤掉所有不超过20次的行。我已经看到了这样的事情。

  g = df.groupby('positions')
g.filter(lambda x :len(x)> 20)

但是这似乎不起作用,我不明白如何从中获取原始数据框。感谢您的帮助。

解决方案

在您的有限数据集中,以下工作:

  In [125]:
df.groupby('positions')['r vals']。(lambda x:len(x)> = 3)

出[125]:
0 1.2
2 2.3
3 1.8
6 1.9
名称:r vals,dtype:你可以分配这个过滤器的结果,并使用

/pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html#pandas.Series.isinrel =noreferrer> isin 过滤您的原始df:

 在[129]中:
filtered = df.groupby('positions ')['r vals']。filter(lambda x:len(x)> = 3)
df [df ['r vals']。isin(filtered)]

出[129]:
r vals职位
0 1.2 1
1 1.8 2
2 2.3 1
3 1.8 1
6 1.9 1

您只需在您的情况下将 3 更改为 20

<另一种方法是使用
value_counts 来创建一个聚合系列,我们可以使用它来过滤你的DF:

在[136]中:
counts = df ['positions']。value_counts()
counts

Out [136]:
1 4
3 2
2 1
dtype:int64

在[137]中:
counts [counts> 3]

[137]:
1 4
dtype:int64

在[135]中:
df [df [ isin(counts [counts> 3] .index)]

Out [135]:
r vals职位
0 1.2 1
2 2.3 1
3 1.8 1
6 1.9 1

编辑



如果要过滤数据框上的groupby对象而不是Series,那么可以调用 filter 直接:

  In [139]:
filtered = df.groupby('positions')。filter(lambda x :len(x)> = 3)
已过滤

输出[139]:
r vals职位
0 1.2 1
2 2.3 1
3 1.8 1
6 1.9 1


So I have a pandas DataFrame that looks like this:

r vals    positions
1.2       1
1.8       2
2.3       1
1.8       1
2.1       3
2.0       3
1.9       1
...       ...

I would like the filter out all rows by position that do not appear at least 20 times. I have seen something like this

g=df.groupby('positions')
g.filter(lambda x: len(x) > 20)

but this does not seem to work and I do not understand how to get the original dataframe back from this. Thanks in advance for the help.

On your limited dataset the following works:

In [125]:
df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3)

Out[125]:
0    1.2
2    2.3
3    1.8
6    1.9
Name: r vals, dtype: float64

You can assign the result of this filter and use this with isin to filter your orig df:

In [129]:
filtered = df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3)
df[df['r vals'].isin(filtered)]

Out[129]:
   r vals  positions
0     1.2          1
1     1.8          2
2     2.3          1
3     1.8          1
6     1.9          1

You just need to change 3 to 20 in your case

Another approach would be to use value_counts to create an aggregate series, we can then use this to filter your df:

In [136]:
counts = df['positions'].value_counts()
counts

Out[136]:
1    4
3    2
2    1
dtype: int64

In [137]:
counts[counts > 3]

Out[137]:
1    4
dtype: int64

In [135]:
df[df['positions'].isin(counts[counts > 3].index)]

Out[135]:
   r vals  positions
0     1.2          1
2     2.3          1
3     1.8          1
6     1.9          1

EDIT

If you want to filter the groupby object on the dataframe rather than a Series then you can call filter on the groupby object directly:

In [139]:
filtered = df.groupby('positions').filter(lambda x: len(x) >= 3)
filtered

Out[139]:
   r vals  positions
0     1.2          1
2     2.3          1
3     1.8          1
6     1.9          1

这篇关于Python pandas :排除特定频率数以下的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆