根据列中的条件将Pandas数据框拆分为多个数据框 [英] Splitting Pandas dataframe into multiple dataframes based on condition in column
问题描述
要为ML任务正确准备数据,我需要能够将原始数据帧拆分为多个较小的数据帧。我想获取上面的所有行,包括 BOOL列的值为1的行-每次出现1。即n个数据帧,其中n是出现1的次数。
To prep my data correctly for a ML task, I need to be able to split my original dataframe into multiple smaller dataframes. I want to get all the rows above and including the row where the value for column 'BOOL' is 1 - for every occurrence of 1. i.e. n dataframes where n is the number of occurences of 1.
数据样本:
df = pd.DataFrame({"USER_ID": ['001', '001', '001', '001', '001'],
'VALUE' : [1, 2, 3, 4, 5], "BOOL": [0, 1, 0, 1, 0]})
预期输出为2个数据帧,如下所示:
Expected Output is 2 dataframes as shown:
并且:
我已经考虑过使用if-else语句追加行的for循环-但效率极低我正在使用的数据集。寻找一种更Python化的方式来做到这一点。
I have considered a for loop using if-else statements to append rows - but it is highly inefficient for the data-set I am using. Looking for a more pythonic way of doing this.
推荐答案
您可以使用 np.split
,它接受在哪里分割的索引数组:
You can use np.split
which accepts an array of indices where to split:
np.split(df, *np.where(df.BOOL == 1))
如果要在前一个数据框中包含 BOOL == 1
的行,则只需添加1到所有索引:
If you want to include the rows with BOOL == 1
to the previous data frame you can just add 1 to all the indices:
np.split(df, np.where(df.BOOL == 1)[0] + 1)
这篇关于根据列中的条件将Pandas数据框拆分为多个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!