Python pandas - 在groupby之后过滤行 [英] Python pandas - filter rows after groupby

查看:1307
本文介绍了Python pandas - 在groupby之后过滤行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如我有以下表格:

  index,A,B 
0,0,0
1,0,8
2,0,8
3,1,0
4,1,5

通过 A 分组:

  0:
索引,A,B
0,0,0
1,0,8
2,0,8

1:
指数,A,B
3,1,5
4,1,3

我需要从每个组中删除行,其中列 B 中的数字小于组列中所有行的最大值 B 。那么我有一个问题翻译和制定这个问题,以英文,所以这里是例子:

列中的行的最大值 B 0 8

索引 0 并保留行的索引 1 2



1 中列 B 5



所以我想删除索引 4 索引 3



我曾尝试使用pandas过滤函数,但问题在于它是在所有行在一次组中:

  data =<示例表> 
groupped = data.groupby(A)
filtered = grouped.filter(lambda x:x [B] == x [B] .max())

所以我理想需要的是一些过滤器,遍历组中的所有行。



感谢您的帮助!

PS是否也有办法只删除组中的行,不返回 DataFrame 对象?

解决方案你只需要在 groupby 对象上使用 apply 。我修改了你的示例数据,使其更清楚:

 从io导入pandas 
导入StringIO

csv = StringIO(索引,A,B
0,1,0.0
1,1,3.0
2,1,6.0
3, 2,0.0
4,2,5.0
5,2,7.0)

df = pandas.read_csv(csv,index_col ='index')
groups = df.groupby(by = ['A'])
print(groups.apply(lambda g:g [g ['B'] == g ['B']。max()]) )

打印:

  AB 
A索引
1 2 1 6
2 4 2 7


For example I have following table:

index,A,B
0,0,0
1,0,8
2,0,8
3,1,0
4,1,5

After grouping by A:

0:
index,A,B
0,0,0
1,0,8
2,0,8

1:
index,A,B
3,1,5
4,1,3

What I need is to drop rows from each group, where the number in column B is less than maximum value from all rows from group's column B. Well I have a problem translating and formulating this problem to English so here is the example:

Maximum value from rows in column B in group 0: 8

So I want to drop row with index 0 and keep rows with indexes 1 and 2

Maximum value from rows in column B in group 1: 5

So I want to drop row with index 4 and keep row with index 3

I have tried to use pandas filter function, but the problem is that it is operating on all rows in group at one time:

data = <example table>
grouped = data.groupby("A")
filtered = grouped.filter(lambda x: x["B"] == x["B"].max())

So what I ideally need is some filter, which iterates through all rows in group.

Thanks for help!

P.S. Is there also way to only delete rows in groups and do not return DataFrame object?

解决方案

You just need to use apply on the groupby object. I modified your example data to make this a little more clear:

import pandas
from io import StringIO

csv = StringIO("""index,A,B
0,1,0.0
1,1,3.0
2,1,6.0
3,2,0.0
4,2,5.0
5,2,7.0""")

df = pandas.read_csv(csv, index_col='index')
groups = df.groupby(by=['A'])
print(groups.apply(lambda g: g[g['B'] == g['B'].max()]))

Which prints:

         A  B
A index      
1 2      1  6
2 4      2  7

这篇关于Python pandas - 在groupby之后过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆