pandas :查找与组中的谓词匹配的第一行的组索引(如果有) [英] Pandas: find group index of first row matching a predicate in a group, if any

查看:67
本文介绍了 pandas :查找与组中的谓词匹配的第一行的组索引(如果有)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想按某个条件对DataFrame进行分组,然后在满足某些谓词的第一行的组中找到整数索引 (不是 DataFrame).如果没有这样的行,我想获取NaN.

I want to group a DataFrame by some criteria, and then find the integer index in the group (not the DataFrame) of the first row satisfying some predicate. If there is no such row, I want to get NaN.

例如,我将a列除以5,然后在每个组中找到列b为红色"的第一行的索引:

For example, I group by column a divided by 5 and then in each group, find the index of the first row where column b is "red":

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': xrange(12), 'b': ['red', 'green', 'blue'] * 4})


     a      b
0    0    red
1    1  green
2    2   blue
3    3    red
4    4  green
5    5   blue
6    6    red
7    7  green
8    8   blue
9    9    red
10  10  green
11  11   blue

df.groupby(df.a // 5).apply(lambda g: next((idx for idx, row in g.reset_index(drop=True).iterrows() if row.b == "red"), None))


a
0     0
1     1
2   NaN
dtype: float64

(我想我假设行与原始DataFrame中的行保持相同的顺序,但是我可以根据需要对组进行排序.)是否有更简洁,有效的方法来做到这一点?

(I guess I'm assuming rows stay in the same order as the in original DataFrame, but I can sort the group if needed.) Is there a more concise, efficient way to do this?

推荐答案

这有点长,但是恕我直言更容易理解/可定制

This is a bit longer, but IMHO is more understandable / customizable

In [126]: df2 = df.copy()

这是您的组指标

In [127]: g = df.a//5

对创建组的引用

In [128]: grp = df.groupby(g)

在生成的组中创建列以及该组内的累积计数

Create a columns of the generated group and the cumulative count within the group

In [129]: df2['group'] = g

In [130]: df2['count'] = grp.cumcount()

In [131]: df2
Out[131]: 
     a      b  group  count
0    0    red      0      0
1    1  green      0      1
2    2   blue      0      2
3    3    red      0      3
4    4  green      0      4
5    5   blue      1      0
6    6    red      1      1
7    7  green      1      2
8    8   blue      1      3
9    9    red      1      4
10  10  green      2      0
11  11   blue      2      1

过滤和分组将使您返回想要的第一个元素.计数是组内计数

Filtering and grouping gives you back the first elemnt that you want. The count is the within group count

In [132]: df2[df2.b=='red'].groupby('group').first()
Out[132]: 
       a    b  count
group               
0      0  red      0
1      6  red      1

您可以生成所有的组密钥(例如,过滤器没有返回任何内容);这样.

You can generate all of the group keys (e.g. nothing came back from your filter); this way.

In [133]: df2[df2.b=='red'].groupby('group').first().reindex(grp.groups.keys())
Out[133]: 
    a    b  count
0   0  red      0
1   6  red      1
2 NaN  NaN    NaN

这篇关于 pandas :查找与组中的谓词匹配的第一行的组索引(如果有)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆