根据条件获取行并将它们分成子集 [英] get rows based on a condition and separate them into subsets

查看:79
本文介绍了根据条件获取行并将它们分成子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试根据条件对数据集进行子集化并选择行,直到看到基于条件的值

am trying to subset a dataset based on a condition and pick the rows until it sees the value based on a condition

条件,如果 A 列 == 0,B 列应该以 'a' 开头.

Condition, if Column A == 0, column B should start with 'a'.

数据集:

A   B
0   aa
1   ss
2   dd
3   ff
0   ee
1   ff
2   bb
3   gg
0   ar
1   hh
2   ww
0   jj
1   ll

预期:

[0:{'A':[0,1,2,3], 'B':['aa','ss','dd','ff']}, 1:{'A':[0,1,2], 'B':['ar','hh,'ww']} ]

该系列从 A == 0 列开始,直到下一个 0 为止.该数据框中总共有 4 个不同的字典.

The series starts from column A == 0 and ends until the next 0. In total there are 4 different dictionaries in that dataframe.

推荐答案

在条件上做一个cumsum来识别组,然后groupby:

Do a cumsum on the condition to identify the groups, then groupby:

groups = (df['A'].eq(0) & df['B'].str.startswith('a')).cumsum()

{k:v.to_dict(orient='list') for k,v in df.groupby(groups)}

输出:

{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
 2: {'A': [0, 1, 2, 3], 'B': ['ae', 'ff', 'bb', 'gg']},
 3: {'A': [0, 1, 2, 0, 1], 'B': ['ar', 'hh', 'ww', 'jj', 'll']}}

这篇关于根据条件获取行并将它们分成子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆