Pandas Group/Merge Dataframe by Non-Periodic Series [英] Pandas Group/Merge Dataframe by Non-Periodic Series
问题描述
如何将一个 DataFrame 按另一个可能的非周期性系列分组?下面的模型:
这是要拆分的DataFrame:
i = pd.date_range(end=today", period=20, freq=d").normalize()v = np.random.randint(0,100,size=len(i))d = pd.DataFrame({value": v}, index=i)
<预><代码>>>>d价值2021-02-06 482021-02-07 12021-02-08 862021-02-09 822021-02-10 402021-02-11 222021-02-12 632021-02-13 372021-02-14 412021-02-15 572021-02-16 302021-02-17 692021-02-18 632021-02-19 272021-02-20 232021-02-21 462021-02-22 662021-02-23 102021-02-24 912021-02-25 43
这是拆分标准,按系列日期分组.一个组由任何有序的数据帧值 v
组成,使得 {v}
与 [s,s+1)
相交 - 但与重采样一样控制好包含参数.
s = pd.date_range(start=2019-10-14", freq=2W", period=52).to_series()s = s.drop(np.random.choice(s.index, 10, replace=False))s = s.reset_index(drop=True)
<预><代码>>>>s[25:29]25 2021-01-2426 2021-02-0727 2021-02-2128 2021-03-07数据类型:datetime64[ns]
这是示例输出......或类似的东西.索引取自系列而不是数据框.
<预><代码>>>>???.和()价值...2021-01-24 472021-02-07 7682021-02-21 334...在内部,这些组将具有以下结构:
...2021-01-10总和:02021-01-242021-02-06 47总和:472021-02-072021-02-07 522021-02-08 562021-02-09 212021-02-10 392021-02-11 862021-02-12 302021-02-13 202021-02-14 762021-02-15 912021-02-16 702021-02-17 342021-02-18 732021-02-19 412021-02-20 79总和:7682021-02-212021-02-21 902021-02-22 752021-02-23 122021-02-24 702021-02-25 87总和:3342021-03-07总和:0...
看起来你可以这样做:
bucket = pd.cut(d.index, bins=s, label=s[:-1], right=False)d.groupby(bucket).sum()
How do I group one DataFrame by another possibly-non-periodic Series? Mock-up below:
This is the DataFrame to be split:
i = pd.date_range(end="today", periods=20, freq="d").normalize()
v = np.random.randint(0,100,size=len(i))
d = pd.DataFrame({"value": v}, index=i)
>>> d
value
2021-02-06 48
2021-02-07 1
2021-02-08 86
2021-02-09 82
2021-02-10 40
2021-02-11 22
2021-02-12 63
2021-02-13 37
2021-02-14 41
2021-02-15 57
2021-02-16 30
2021-02-17 69
2021-02-18 63
2021-02-19 27
2021-02-20 23
2021-02-21 46
2021-02-22 66
2021-02-23 10
2021-02-24 91
2021-02-25 43
This is the splitting criteria, grouping by the Series dates. A group consists of any ordered dataframe value v
such that {v}
intersects [s,s+1)
- but as with resampling it would be nice to control the inclusion parameters.
s = pd.date_range(start="2019-10-14", freq="2W", periods=52).to_series()
s = s.drop(np.random.choice(s.index, 10, replace=False))
s = s.reset_index(drop=True)
>>> s[25:29]
25 2021-01-24
26 2021-02-07
27 2021-02-21
28 2021-03-07
dtype: datetime64[ns]
And this is the example output... or something like it. Index is taken from the series rather than the dataframe.
>>> ???.sum()
value
...
2021-01-24 47
2021-02-07 768
2021-02-21 334
...
Internally the groups would have this structure:
...
2021-01-10
sum: 0
2021-01-24
2021-02-06 47
sum: 47
2021-02-07
2021-02-07 52
2021-02-08 56
2021-02-09 21
2021-02-10 39
2021-02-11 86
2021-02-12 30
2021-02-13 20
2021-02-14 76
2021-02-15 91
2021-02-16 70
2021-02-17 34
2021-02-18 73
2021-02-19 41
2021-02-20 79
sum: 768
2021-02-21
2021-02-21 90
2021-02-22 75
2021-02-23 12
2021-02-24 70
2021-02-25 87
sum: 334
2021-03-07
sum: 0
...
Looks like you can do:
bucket = pd.cut(d.index, bins=s, label=s[:-1], right=False)
d.groupby(bucket).sum()
这篇关于Pandas Group/Merge Dataframe by Non-Periodic Series的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!