根据条件获取行并将它们分成子集 [英] get rows based on a condition and separate them into subsets
本文介绍了根据条件获取行并将它们分成子集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在尝试根据条件对数据集进行子集化并选择行,直到看到基于条件的值
am trying to subset a dataset based on a condition and pick the rows until it sees the value based on a condition
条件,如果 A 列 == 0,B 列应该以 'a' 开头.
Condition, if Column A == 0, column B should start with 'a'.
数据集:
A B
0 aa
1 ss
2 dd
3 ff
0 ee
1 ff
2 bb
3 gg
0 ar
1 hh
2 ww
0 jj
1 ll
预期:
[0:{'A':[0,1,2,3], 'B':['aa','ss','dd','ff']}, 1:{'A':[0,1,2], 'B':['ar','hh,'ww']} ]
该系列从 A == 0 列开始,直到下一个 0 为止.该数据框中总共有 4 个不同的字典.
The series starts from column A == 0 and ends until the next 0. In total there are 4 different dictionaries in that dataframe.
推荐答案
在条件上做一个cumsum来识别组,然后groupby:
Do a cumsum on the condition to identify the groups, then groupby:
groups = (df['A'].eq(0) & df['B'].str.startswith('a')).cumsum()
{k:v.to_dict(orient='list') for k,v in df.groupby(groups)}
输出:
{1: {'A': [0, 1, 2, 3], 'B': ['aa', 'ss', 'dd', 'ff']},
2: {'A': [0, 1, 2, 3], 'B': ['ae', 'ff', 'bb', 'gg']},
3: {'A': [0, 1, 2, 0, 1], 'B': ['ar', 'hh', 'ww', 'jj', 'll']}}
这篇关于根据条件获取行并将它们分成子集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文