Python pandas ：排除特定频率数以下的行 [英] Python pandas: exclude rows below a certain frequency count

查看：174 发布时间：2017/11/8 19:49:26 python pandas filter dataframe

本文介绍了Python pandas ：排除特定频率数以下的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

所以我有一个像这样的熊猫DataFrame：

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ b 1.8 2
2.3 1
1.8 1
2.1 3
2.0 3
1.9 1
... ...

我想按位置过滤掉所有不超过20次的行。我已经看到了这样的事情。

  g = df.groupby（'positions'）
 g.filter（lambda x ：len（x）> 20）

但是这似乎不起作用，我不明白如何从中获取原始数据框。感谢您的帮助。

解决方案

在您的有限数据集中，以下工作：

  In [125]：
 df.groupby（'positions'）['r vals']。（lambda x：len（x）> = 3）
 
出[125]：
 0 1.2 
 2 2.3 
 3 1.8 
 6 1.9 
名称：r vals，dtype：你可以分配这个过滤器的结果，并使用

/pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.isin.html#pandas.Series.isinrel =noreferrer> isin 过滤您的原始df：

 在[129]中：
 filtered = df.groupby（'positions '）['r vals']。filter（lambda x：len（x）> = 3）
 df [df ['r vals']。isin（filtered）] 
 
出[129]：
r vals职位
 0 1.2 1 
 1 1.8 2 
 2 2.3 1 
 3 1.8 1 
 6 1.9 1

您只需在您的情况下将 3 更改为 20

<另一种方法是使用 value_counts 来创建一个聚合系列，我们可以使用它来过滤你的DF：

在[136]中：
 counts = df ['positions']。value_counts（）
 counts 
 
 Out [136]：
 1 4 
 3 2 
 2 1 
 dtype：int64 
 
在[137]中：
 counts [counts> 3] 
 
 [137]：
 1 4 
 dtype：int64 
 
在[135]中：
 df [df [ isin（counts [counts> 3] .index）] 
 
 Out [135]：
r vals职位
 0 1.2 1 
 2 2.3 1 
 3 1.8 1 
 6 1.9 1

编辑

如果要过滤数据框上的groupby对象而不是Series，那么可以调用 filter 直接：

In [139]： filtered = df.groupby（'positions'）。filter（lambda x ：len（x）> = 3）已过滤输出[139]： r vals职位 0 1.2 1 2 2.3 1 3 1.8 1 6 1.9 1

So I have a pandas DataFrame that looks like this:
r vals positions 1.2 1 1.8 2 2.3 1 1.8 1 2.1 3 2.0 3 1.9 1 ... ...
I would like the filter out all rows by position that do not appear at least 20 times. I have seen something like this
g=df.groupby('positions') g.filter(lambda x: len(x) > 20)
but this does not seem to work and I do not understand how to get the original dataframe back from this. Thanks in advance for the help.
解决方案
On your limited dataset the following works:
In [125]: df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3) Out[125]: 0 1.2 2 2.3 3 1.8 6 1.9 Name: r vals, dtype: float64
You can assign the result of this filter and use this with isin to filter your orig df:
In [129]: filtered = df.groupby('positions')['r vals'].filter(lambda x: len(x) >= 3) df[df['r vals'].isin(filtered)] Out[129]: r vals positions 0 1.2 1 1 1.8 2 2 2.3 1 3 1.8 1 6 1.9 1
You just need to change 3 to 20 in your case

Another approach would be to use value_counts to create an aggregate series, we can then use this to filter your df:
In [136]: counts = df['positions'].value_counts() counts Out[136]: 1 4 3 2 2 1 dtype: int64 In [137]: counts[counts > 3] Out[137]: 1 4 dtype: int64 In [135]: df[df['positions'].isin(counts[counts > 3].index)] Out[135]: r vals positions 0 1.2 1 2 2.3 1 3 1.8 1 6 1.9 1
EDIT

If you want to filter the groupby object on the dataframe rather than a Series then you can call filter on the groupby object directly:
In [139]: filtered = df.groupby('positions').filter(lambda x: len(x) >= 3) filtered Out[139]: r vals positions 0 1.2 1 2 2.3 1 3 1.8 1 6 1.9 1

这篇关于Python pandas ：排除特定频率数以下的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Python pandas ：排除特定频率数以下的行 [英] Python pandas: exclude rows below a certain frequency count

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Python pandas ：排除特定频率数以下的行 [英] Python pandas: exclude rows below a certain frequency count

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭