Python Pandas - 在 groupby 后过滤行 [英] Python pandas - filter rows after groupby
问题描述
例如我有下表:
index,A,B
0,0,0
1,0,8
2,0,8
3,1,0
4,1,5
按A
分组后:
0:
index,A,B
0,0,0
1,0,8
2,0,8
1:
index,A,B
3,1,5
4,1,3
我需要的是从每个组中删除行,其中 B
列中的数字小于组列 B
中所有行的最大值.好吧,我在将这个问题翻译和表述为英语时遇到了问题,所以这里是示例:
What I need is to drop rows from each group, where the number in column B
is less than maximum value from all rows from group's column B
. Well I have a problem translating and formulating this problem to English so here is the example:
0
组中 B
列的行的最大值:8
Maximum value from rows in column B
in group 0
: 8
所以我想删除索引为 0
的行并保留索引为 1
和 2
So I want to drop row with index 0
and keep rows with indexes 1
and 2
组1
中B
列的行的最大值:5
所以我想删除索引为 4
的行并保留索引为 3
So I want to drop row with index 4
and keep row with index 3
我曾尝试使用pandas过滤功能,但问题是它一次对组中的所有行进行操作:
I have tried to use pandas filter function, but the problem is that it is operating on all rows in group at one time:
data = <example table>
grouped = data.groupby("A")
filtered = grouped.filter(lambda x: x["B"] == x["B"].max())
所以我理想情况下需要的是一些过滤器,它遍历组中的所有行.
感谢您的帮助!
附言有没有办法只删除组中的行而不返回DataFrame
对象?
P.S. Is there also way to only delete rows in groups and do not return DataFrame
object?
推荐答案
您只需要在 groupby
对象上使用 apply
即可.我修改了您的示例数据以使其更加清晰:
You just need to use apply
on the groupby
object. I modified your example data to make this a little more clear:
import pandas
from io import StringIO
csv = StringIO("""index,A,B
0,1,0.0
1,1,3.0
2,1,6.0
3,2,0.0
4,2,5.0
5,2,7.0""")
df = pandas.read_csv(csv, index_col='index')
groups = df.groupby(by=['A'])
print(groups.apply(lambda g: g[g['B'] == g['B'].max()]))
打印:
A B
A index
1 2 1 6
2 4 2 7
这篇关于Python Pandas - 在 groupby 后过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!