根据列值拆分数据框 [英] Splitting a dataframe based on column values

查看:82
本文介绍了根据列值拆分数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的数据框

I have a dataframe like such

 EndDate
2007-10-31              0
2007-11-30    -0.03384464
2007-12-31     -0.0336299
2008-01-31   -0.009448923
2008-02-29              0
2008-03-31    -0.05744962
2008-04-30     -0.0386942
2008-05-31              0
2008-06-30    -0.03624518
2008-07-31   -0.005286455
2008-08-31              0
2008-09-30     -0.1619864
2008-10-31     -0.2862122
2008-11-30     -0.2942793
2008-12-31     -0.2913253

现在,我想在每次出现0后拆分数据帧. 因此新的数据框应如下所示:

Now I want to split the dataframe after every occurance of 0. thus new dataframes should look like:

Dataframe 1: 
    2007-11-30    -0.03384464
    2007-12-31     -0.0336299
    2008-01-31   -0.009448923
    2008-02-29              0

Dataframe 2:
    2008-03-31    -0.05744962
    2008-04-30     -0.0386942
    2008-05-31              0

Dataframe 3:
    2008-06-30    -0.03624518
    2008-07-31   -0.005286455
    2008-08-31              0

Dataframe 4:
    2008-09-30     -0.1619864
    2008-10-31     -0.2862122
    2008-11-30     -0.2942793
    2008-12-31     -0.2913253

我不确定该怎么做. 我可以遍历每行以寻找0,但我认为应该有更好的方法.

I am not sure how that can be done. I can iterate over every row looking for 0 but i think there should be a better way.

推荐答案

首先,您可以通过将值列与零进行比较来创建组号,然后对这些布尔值求和.

First, you can create group numbers by comparing the value column to zero and then taking a cumulative sum of these boolean values.

df['group_no'] = (df.val == 0).cumsum()
>>> df.head(6)
      EndDate       val  group_no
0  2007-10-31  0.000000         1
1  2007-11-30 -0.033845         1
2  2007-12-31 -0.033630         1
3  2008-01-31 -0.009449         1
4  2008-02-29  0.000000         2
5  2008-03-31 -0.057450         2

接下来,您可以将字典理解与loc一起使用以选择相关的group_no数据框.要获取最后的组号,我可以使用iat获取基于位置的索引的最后一个值.

Next, you can use a dictionary comprehension together with loc to select the relevant group_no dataframe. To get the last group number, I get the last value using iat for location based indexing.

d = {i: df.loc[df.group_no == i, ['EndDate', 'val']] 
     for i in range(1, df.group_no.iat[-1])}

>>> d
{1:       EndDate       val
 0  2007-10-31  0.000000
 1  2007-11-30 -0.033845
 2  2007-12-31 -0.033630
 3  2008-01-31 -0.009449, 
 2:       EndDate       val
 4  2008-02-29  0.000000
 5  2008-03-31 -0.057450
 6  2008-04-30 -0.038694, 
 3:       EndDate       val
 7  2008-05-31  0.000000
 8  2008-06-30 -0.036245
 9  2008-07-31 -0.005286}

编辑 正如@DSM所建议的,基于具有15,000行的示例数据帧,使用groupby的速度似乎快6倍.

EDIT As suggested by @DSM, using groupby appears to be about 6x faster based on a sample dataframe with 15k rows.

d = {n: df2.ix[rows] 
     for n, rows in enumerate(df2.groupby('group_no').groups)}

这篇关于根据列值拆分数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆