pandas :查找与组中的谓词匹配的第一行的组索引(如果有) [英] Pandas: find group index of first row matching a predicate in a group, if any
问题描述
我想按某个条件对DataFrame进行分组,然后在满足某些谓词的第一行的组中找到整数索引 (不是 DataFrame).如果没有这样的行,我想获取NaN
.
I want to group a DataFrame by some criteria, and then find the integer index in the group (not the DataFrame) of the first row satisfying some predicate. If there is no such row, I want to get NaN
.
例如,我将a
列除以5,然后在每个组中找到列b
为红色"的第一行的索引:
For example, I group by column a
divided by 5 and then in each group, find the index of the first row where column b
is "red":
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': xrange(12), 'b': ['red', 'green', 'blue'] * 4})
a b
0 0 red
1 1 green
2 2 blue
3 3 red
4 4 green
5 5 blue
6 6 red
7 7 green
8 8 blue
9 9 red
10 10 green
11 11 blue
df.groupby(df.a // 5).apply(lambda g: next((idx for idx, row in g.reset_index(drop=True).iterrows() if row.b == "red"), None))
a
0 0
1 1
2 NaN
dtype: float64
(我想我假设行与原始DataFrame中的行保持相同的顺序,但是我可以根据需要对组进行排序.)是否有更简洁,有效的方法来做到这一点?
(I guess I'm assuming rows stay in the same order as the in original DataFrame, but I can sort the group if needed.) Is there a more concise, efficient way to do this?
推荐答案
这有点长,但是恕我直言更容易理解/可定制
This is a bit longer, but IMHO is more understandable / customizable
In [126]: df2 = df.copy()
这是您的组指标
In [127]: g = df.a//5
对创建组的引用
In [128]: grp = df.groupby(g)
在生成的组中创建列以及该组内的累积计数
Create a columns of the generated group and the cumulative count within the group
In [129]: df2['group'] = g
In [130]: df2['count'] = grp.cumcount()
In [131]: df2
Out[131]:
a b group count
0 0 red 0 0
1 1 green 0 1
2 2 blue 0 2
3 3 red 0 3
4 4 green 0 4
5 5 blue 1 0
6 6 red 1 1
7 7 green 1 2
8 8 blue 1 3
9 9 red 1 4
10 10 green 2 0
11 11 blue 2 1
过滤和分组将使您返回想要的第一个元素.计数是组内计数
Filtering and grouping gives you back the first elemnt that you want. The count is the within group count
In [132]: df2[df2.b=='red'].groupby('group').first()
Out[132]:
a b count
group
0 0 red 0
1 6 red 1
您可以生成所有的组密钥(例如,过滤器没有返回任何内容);这样.
You can generate all of the group keys (e.g. nothing came back from your filter); this way.
In [133]: df2[df2.b=='red'].groupby('group').first().reindex(grp.groups.keys())
Out[133]:
a b count
0 0 red 0
1 6 red 1
2 NaN NaN NaN
这篇关于 pandas :查找与组中的谓词匹配的第一行的组索引(如果有)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!