Python pandas - 在groupby之后过滤行 [英] Python pandas - filter rows after groupby
问题描述
index,A,B
0,0,0
1,0,8
2,0,8
3,1,0
4,1,5
通过 A
分组:
0:
索引,A,B
0,0,0
1,0,8
2,0,8
1:
指数,A,B
3,1,5
4,1,3
我需要从每个组中删除行,其中列 B
中的数字小于组列中所有行的最大值 B
。那么我有一个问题翻译和制定这个问题,以英文,所以这里是例子:
列中的行的最大值 B
0
: 8
索引 0
并保留行的索引 1
和 2
1 中列 B
: 5
所以我想删除索引 4
索引 3
我曾尝试使用pandas过滤函数,但问题在于它是在所有行在一次组中:
data =<示例表>
groupped = data.groupby(A)
filtered = grouped.filter(lambda x:x [B] == x [B] .max())
所以我理想需要的是一些过滤器,遍历组中的所有行。
感谢您的帮助!
PS是否也有办法只删除组中的行,不返回 DataFrame
对象?
groupby
对象上使用 apply 。我修改了你的示例数据,使其更清楚:
从io导入pandas
导入StringIO
csv = StringIO(索引,A,B
0,1,0.0
1,1,3.0
2,1,6.0
3, 2,0.0
4,2,5.0
5,2,7.0)
df = pandas.read_csv(csv,index_col ='index')
groups = df.groupby(by = ['A'])
print(groups.apply(lambda g:g [g ['B'] == g ['B']。max()]) )
打印:
AB
A索引
1 2 1 6
2 4 2 7
For example I have following table:
index,A,B
0,0,0
1,0,8
2,0,8
3,1,0
4,1,5
After grouping by A
:
0:
index,A,B
0,0,0
1,0,8
2,0,8
1:
index,A,B
3,1,5
4,1,3
What I need is to drop rows from each group, where the number in column B
is less than maximum value from all rows from group's column B
. Well I have a problem translating and formulating this problem to English so here is the example:
Maximum value from rows in column B
in group 0
: 8
So I want to drop row with index 0
and keep rows with indexes 1
and 2
Maximum value from rows in column B
in group 1
: 5
So I want to drop row with index 4
and keep row with index 3
I have tried to use pandas filter function, but the problem is that it is operating on all rows in group at one time:
data = <example table>
grouped = data.groupby("A")
filtered = grouped.filter(lambda x: x["B"] == x["B"].max())
So what I ideally need is some filter, which iterates through all rows in group.
Thanks for help!
P.S. Is there also way to only delete rows in groups and do not return DataFrame
object?
解决方案 You just need to use apply
on the groupby
object. I modified your example data to make this a little more clear:
import pandas
from io import StringIO
csv = StringIO("""index,A,B
0,1,0.0
1,1,3.0
2,1,6.0
3,2,0.0
4,2,5.0
5,2,7.0""")
df = pandas.read_csv(csv, index_col='index')
groups = df.groupby(by=['A'])
print(groups.apply(lambda g: g[g['B'] == g['B'].max()]))
Which prints:
A B
A index
1 2 1 6
2 4 2 7
这篇关于Python pandas - 在groupby之后过滤行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!