Pandas Group/Merge Dataframe by Non-Periodic Series [英] Pandas Group/Merge Dataframe by Non-Periodic Series

查看:85
本文介绍了Pandas Group/Merge Dataframe by Non-Periodic Series的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何将一个 DataFrame 按另一个可能的非周期性系列分组?下面的模型:

这是要拆分的DataFrame:

i = pd.date_range(end=today", period=20, freq=d").normalize()v = np.random.randint(0,100,size=len(i))d = pd.DataFrame({value": v}, index=i)

<预><代码>>>>d价值2021-02-06 482021-02-07 12021-02-08 862021-02-09 822021-02-10 402021-02-11 222021-02-12 632021-02-13 372021-02-14 412021-02-15 572021-02-16 302021-02-17 692021-02-18 632021-02-19 272021-02-20 232021-02-21 462021-02-22 662021-02-23 102021-02-24 912021-02-25 43

这是拆分标准,按系列日期分组.一个组由任何有序的数据帧值 v 组成,使得 {v}[s,s+1) 相交 - 但与重采样一样控制好包含参数.

s = pd.date_range(start=2019-10-14", freq=2W", period=52).to_series()s = s.drop(np.random.choice(s.index, 10, replace=False))s = s.reset_index(drop=True)

<预><代码>>>>s[25:29]25 2021-01-2426 2021-02-0727 2021-02-2128 2021-03-07数据类型:datetime64[ns]

这是示例输出......或类似的东西.索引取自系列而不是数据框.

<预><代码>>>>???.和()价值...2021-01-24 472021-02-07 7682021-02-21 334...

在内部,这些组将具有以下结构:

...2021-01-10总和:02021-01-242021-02-06 47总和:472021-02-072021-02-07 522021-02-08 562021-02-09 212021-02-10 392021-02-11 862021-02-12 302021-02-13 202021-02-14 762021-02-15 912021-02-16 702021-02-17 342021-02-18 732021-02-19 412021-02-20 79总和:7682021-02-212021-02-21 902021-02-22 752021-02-23 122021-02-24 702021-02-25 87总和:3342021-03-07总和:0...

解决方案

看起来你可以这样做:

bucket = pd.cut(d.index, bins=s, label=s[:-1], right=False)d.groupby(bucket).sum()

How do I group one DataFrame by another possibly-non-periodic Series? Mock-up below:

This is the DataFrame to be split:

i = pd.date_range(end="today", periods=20, freq="d").normalize()
v = np.random.randint(0,100,size=len(i))
d = pd.DataFrame({"value": v}, index=i)

>>> d
            value
2021-02-06     48
2021-02-07      1
2021-02-08     86
2021-02-09     82
2021-02-10     40
2021-02-11     22
2021-02-12     63
2021-02-13     37
2021-02-14     41
2021-02-15     57
2021-02-16     30
2021-02-17     69
2021-02-18     63
2021-02-19     27
2021-02-20     23
2021-02-21     46
2021-02-22     66
2021-02-23     10
2021-02-24     91
2021-02-25     43

This is the splitting criteria, grouping by the Series dates. A group consists of any ordered dataframe value v such that {v} intersects [s,s+1) - but as with resampling it would be nice to control the inclusion parameters.

s = pd.date_range(start="2019-10-14", freq="2W", periods=52).to_series()
s = s.drop(np.random.choice(s.index, 10, replace=False))
s = s.reset_index(drop=True)

>>> s[25:29]
25   2021-01-24
26   2021-02-07
27   2021-02-21
28   2021-03-07
dtype: datetime64[ns]

And this is the example output... or something like it. Index is taken from the series rather than the dataframe.

>>> ???.sum()
            value
...
2021-01-24  47
2021-02-07  768
2021-02-21  334
...

Internally the groups would have this structure:

...
2021-01-10
        sum:        0
2021-01-24
    2021-02-06     47
        sum:       47
2021-02-07
    2021-02-07     52
    2021-02-08     56
    2021-02-09     21
    2021-02-10     39
    2021-02-11     86
    2021-02-12     30
    2021-02-13     20
    2021-02-14     76
    2021-02-15     91
    2021-02-16     70
    2021-02-17     34
    2021-02-18     73
    2021-02-19     41
    2021-02-20     79
        sum:      768
2021-02-21
    2021-02-21     90
    2021-02-22     75
    2021-02-23     12
    2021-02-24     70
    2021-02-25     87
        sum:      334
2021-03-07
        sum:        0
...

解决方案

Looks like you can do:

bucket = pd.cut(d.index, bins=s, label=s[:-1], right=False)

d.groupby(bucket).sum()

这篇关于Pandas Group/Merge Dataframe by Non-Periodic Series的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆