Pandas DataFrame组以可变长度重叠的间隔 [英] Pandas DataFrame groupby overlapping intervals of variable length

查看:110
本文介绍了Pandas DataFrame组以可变长度重叠的间隔的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将DataFrame分组为2列(参见下面的示例)。
对于第一列,我希望每个值属于一个组。对于第二列,我想通过重叠不等大小的间隔进行分组。

I am trying to group a DataFrame by 2 columns (see example below). For the first column, I want each value to belong to a group. For the second column, I want to group by overlapping intervals of unequal size.

我的理解是,pd.cut()只允许我按非重叠的时间间隔进行分组。

My understanding is that pd.cut() only allows me to group by non-overlapping intervals.

这是一个例子:

    0   1   2
0   0   4   1721
1   0   5   2353
2   0   6   58
3   0   7   524
4   1   1   1934
5   1   2   1318
6   1   2   1307
7   1   2   301
8   1   2   502
9   1   3   996
10  1   3   32

按列0和1分组我想要:

By grouping by column 0 and 1 I want:

0  1    2
0 [4,5] [1721,2353]
  [5,6] [2353,58]
  [6,7] [58,524]
1 [1,2] [1934,1318,1307,301,502]
  [2,3] [1318,1307,301,502,996,32]

然后我会采取第2列的平均或标准。任何建议?谢谢 !

I would then take mean or std of column 2. Any suggestion? Thanks !

推荐答案

开始于:

    gr1  gr2   val
0     0    4  1721
1     0    5  2353
2     0    6    58
3     0    7   524
4     1    1  1934
5     1    2  1318
6     1    2  1307
7     1    2   301
8     1    2   502
9     1    3   996
10    1    3    32

首先,从 gr2 中的值创建bin:

First, create bins from values in gr2:

bounds = df.gr2.sort_values().unique()
bins = list(zip(bounds[:-1], bounds[1:]))

def overlapping_bins(x):
    return pd.Series([l for l in bins if l[0] <= x <= l[1]])

然后将 val code> bins :

Then assign val values to bins:

df = pd.concat([df, df.gr2.apply(overlapping_bins).stack().reset_index(1, drop=True)], axis=1).rename(columns={0: 'bins'}).drop('gr2', axis=1)

然后 .groupby() result bins

And then .groupby() resulting bins:

df.groupby(['gr1', 'bins']).val.apply(lambda x: x.tolist())

gr1  bins  
0    (3, 4)                             [1721]
     (4, 5)                       [1721, 2353]
     (5, 6)                         [2353, 58]
     (6, 7)                          [58, 524]
1    (1, 2)       [1934, 1318, 1307, 301, 502]
     (2, 3)    [1318, 1307, 301, 502, 996, 32]
     (3, 4)                          [996, 32]

这篇关于Pandas DataFrame组以可变长度重叠的间隔的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆