pandas 按开始/结束值分组 [英] Pandas group by start/end values

查看:57
本文介绍了 pandas 按开始/结束值分组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一些数据,例如

pd.DataFrame(list('SxxxxxxxxESxxxxESxxxxxxxxxxxxE'))

如何将其形成以"S"开头并以"E"结尾的块?

how can I form it into chunks starting with 'S' and ending with 'E'?

当然,实际数据更为复杂,其中一列包含上述数据,而其他列我想在其上使用groupby.

The real data is of course more complex, having one column with data like the above, and other columns that I want to use groupby on.

最终目标是能够从其他列中检索满足标准的所有S/E分隔块(例如,给定神话函数group_chunks来执行此操作,myData.groupby('Person').group_chunks().Value.sum()).

The end goal is to be able to retrieve and act on all S/E-delimited chunks meeting criteria from other columns (e.g., given mythical function group_chunks that does this, myData.groupby('Person').group_chunks().Value.sum()).

响应对更真实的数据和所需输出的请求,数据看起来像:

Responding to a request for more-realistic data and desired output, the data looks something like:

df = pd.DataFrame({'PID': [1]*12+[2]*6,
                   'Cond': ['A']*6+['B']*6+['A']*6,
                   'Flag': ['START', 'DOWN', 'MOVE', 'MOVE', 'LIFT', 'END']*3, 
                   'Value': np.random.random(18)})

   Cond   Flag  PID     Value
0     A  START    1  0.156338
1     A   DOWN    1  0.706541
2     A   MOVE    1  0.569177
3     A   MOVE    1  0.308874
4     A   LIFT    1  0.150780
5     A    END    1  0.553462
6     B  START    1  0.028738
7     B   DOWN    1  0.512303
8     B   MOVE    1  0.975988
9     B   MOVE    1  0.735695
10    B   LIFT    1  0.094430
11    B    END    1  0.467895
12    A  START    2  0.114679
13    A   DOWN    2  0.911095
14    A   MOVE    2  0.359117
15    A   MOVE    2  0.819148
16    A   LIFT    2  0.505313
17    A    END    2  0.874462

因此,使用神话般的group_chunks(并记住STARTEND之间的行数并不总是相同),我想做类似的事情

So using the mythical group_chunks (and keeping in mind the number of rows between START and END is not always the same), I'd want to do something like

df.groupBy('PID').group_chunks('Flag', 'START', 'END').Value.sum()

获得类似

   Cond   PID   Value.sum
0     A     1    2.445172
1     B     1    2.347153
2     A     2    3.583813

推荐答案

您认为使用正则表达式可以帮助解决问题,而不是将字符串作为字符列表来处理吗?例如:

Do you think using regex can help to solve the problem instead of processing string as list of characters? For example:

import re
pattern = r'S.+?E'
re.findall(pattern, 'SxxxxxxxxESxxxxESxxxxxxxxxxxxE')

这篇关于 pandas 按开始/结束值分组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆