Python pandas :排除特定频率数以下的行 [英] Python pandas: exclude rows below a certain frequency count
问题描述
$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b 1.8 2
2.3 1
1.8 1
2.1 3
2.0 3
1.9 1
... ...
我想按位置过滤掉所有不超过20次的行。我已经看到了这样的事情。
g = df.groupby('positions')
g.filter(lambda x :len(x)> 20)
但是这似乎不起作用,我不明白如何从中获取原始数据框。感谢您的帮助。
在您的有限数据集中,以下工作:
In [125]:
df.groupby('positions')['r vals']。(lambda x:len(x)> = 3)
出[125]:
0 1.2
2 2.3
3 1.8
6 1.9
名称:r vals,dtype:你可以分配这个过滤器的结果,并使用
/pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html#pandas.Series.isinrel =noreferrer> isin
过滤您的原始df:
在[129]中:
filtered = df.groupby('positions ')['r vals']。filter(lambda x:len(x)> = 3)
df [df ['r vals']。isin(filtered)]
出[129]:
r vals职位
0 1.2 1
1 1.8 2
2 2.3 1
3 1.8 1
6 1.9 1
您只需在您的情况下将 3
更改为 20
<另一种方法是使用 value_counts
来创建一个聚合系列,我们可以使用它来过滤你的DF:
在[136]中:
counts = df ['positions']。value_counts()
counts
Out [136]:
1 4
3 2
2 1
dtype:int64
在[137]中:
counts [counts> 3]
[137]:
1 4
dtype:int64
在[135]中:
df [df [ isin(counts [counts> 3] .index)]
Out [135]:
r vals职位
0 1.2 1
2 2.3 1
3 1.8 1
6 1.9 1
编辑
如果要过滤数据框上的groupby对象而不是Series,那么可以调用 filter
直接:
In [139]:
filtered = df.groupby('positions')。filter(lambda x :len(x)> = 3)
已过滤
输出[139]:
r vals职位
0 1.2 1
2 2.3 1
3 1.8 1
6 1.9 1
So I have a pandas DataFrame that looks like this:
r vals positions
1.2 1
1.8 2
2.3 1
1.8 1
2.1 3
2.0 3
1.9 1
... ...
I would like the filter out all rows by position that do not appear at least 20 times. I have seen something like this
g=df.groupby('positions')
g.filter(lambda x: len(x) > 20)
but this does not seem to work and I do not understand how to get the original dataframe back from this. Thanks in advance for the help.
On your limited dataset the following works:
In [125]:
df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3)
Out[125]:
0 1.2
2 2.3
3 1.8
6 1.9
Name: r vals, dtype: float64
You can assign the result of this filter and use this with isin
to filter your orig df:
In [129]:
filtered = df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3)
df[df['r vals'].isin(filtered)]
Out[129]:
r vals positions
0 1.2 1
1 1.8 2
2 2.3 1
3 1.8 1
6 1.9 1
You just need to change 3
to 20
in your case
Another approach would be to use value_counts
to create an aggregate series, we can then use this to filter your df:
In [136]:
counts = df['positions'].value_counts()
counts
Out[136]:
1 4
3 2
2 1
dtype: int64
In [137]:
counts[counts > 3]
Out[137]:
1 4
dtype: int64
In [135]:
df[df['positions'].isin(counts[counts > 3].index)]
Out[135]:
r vals positions
0 1.2 1
2 2.3 1
3 1.8 1
6 1.9 1
EDIT
If you want to filter the groupby object on the dataframe rather than a Series then you can call filter
on the groupby object directly:
In [139]:
filtered = df.groupby('positions').filter(lambda x: len(x) >= 3)
filtered
Out[139]:
r vals positions
0 1.2 1
2 2.3 1
3 1.8 1
6 1.9 1
这篇关于Python pandas :排除特定频率数以下的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!