python按周或月分割大 pandas 数据框,并根据这些sp分组数据 [英] python split a pandas data frame by week or month and group the data based on these sp
问题描述
DateOccurred CostCentre TimeDifference
03/09/2012 2073 28138
03/09/2012 6078 34844
03/09/2012 8273 31215
03/09/2012 8367 28160
03/09/2012 8959 32037
03/09/2012 9292 30118
03/09/2012 9532 34200
03/09 / 2012 9705 27240
03/09/2012 10085 31431
03/09/2012 10220 22555
04/09/2012 6078 41126
04/09/2012 7569 31101
04/09/2012 8273 30994
04/09/2012 8959 30064
04/09/2012 9532 34655
04/09/2012 9705 26475
04/09/2012 10085 31443
04/09/2012 10220 33970
05/09/2012 2073 28221
05/09/2012 6078 27894
05/09/2012 7569 29012
05/09/2012 8239 42208
05/09/2012 8273 31128
05/09/2012 8367 27993
05/09/2012 8959 20669
05/09/2012 9292 33070
05/09/2012 9532 8189
05/09/2012 9705 27540
05/09/2012 10085 28798
05/09/2012 10220 23164
06 / 09/2012 2073 28350
06/09/2012 6078 35648
06/09/2012 7042 27129
06/09/2012 7569 31546
06/09/2012 8239 39945
06/09/2012 8273 31107
06/09/2012 8367 27795
06/09/2012 9292 32974
06/09/2012 9532 30320
06/09/2012 9705 37462
06/09/2012 10085 31703
06/09/2012 10220 7807
06/09/2012 14573 186
07/09/2012 0 0
07 / 09/2012 0 0
07/09/2012 2073 28036
07/09/2012 6078 31969
07/09/2012 7569 32941
07/09/2012 8273 30073
07/09/2012 8367 29391
07/09/2012 9292 31927
07/09/2012 9532 30127
07/09/2012 9705 27604
07 / 09/2012 10085 28108
08/09/2012 2073 28463
10/09/2012 6078 31266
10/09/2012 8239 16390
10/09/2012 8273 31140
10/09/2012 8959 30858
10/09/2012 9532 30794
10/09/2012 9705 28752
11/09/2012 0 0
11/09 / 2012 0 0
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 2073 28159
11/09/2012 6078 36835
11/09/2012 8239 45354
11/09/2012 8273 30922
11/09/2012 8367 31382
11/09/2012 8959 29670
11/09/2012 9292 33582
11 / 09/2012 9705 29394
11/09/2012 10085 17140
12/09/2012 2073 28283
12/09/2012 6078 31139
12/09/2012 7042 35063
12/09/2012 8273 31075
12/09/2012 8367 29795
12/09/2012 9292 33496
12/09/2012 9532 31669
12 / 09/2012 9705 26166
12/09/2012 10085 29889
12/09/2012 10220 35656
13/09/2012 2073 28144
13/09/2012 6078 30544
13/09/2012 7097 30866
13/09/2012 8273 30772
13/09/2012 8367 32387
13/09/2012 8959 29307
13/09 / 2012 9292 32348
13/09/2012 9532 28137
13/09/2012 9705 28823
13/09/2012 10085 31543
13/09/2012 10220 28293
14/09/2012 0 12433
14/09/2012 0 12434
14/09/2012 0 12434
14/09/2012 0 12434
14/09/2012 0 12434
14/09 / 2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 12433
14/09/2012 0 0
14/09/2012 0 12433
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 1720
14 / 09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 384
14/09/2012 0 0
14 / 09/2012 0 0
14/09/2012 0 0
14/09/2012 0 383
14/09/2012 2073 28438
14/09/2012 6078 27255
14/09/2 012 8273 29989
14/09/2012 8959 26892
14/09/2012 9292 33202
14/09/2012 9532 30862
14/09/2012 9705 26857
14/09/2012 10085 32657
14/09/2012 10220 27296
15/09/2012 6078 3832
17/09/2012 6078 30004
17/09/2012 7569 30390
17/09/2012 8239 41421
17/09/2012 8273 26337
17/09/2012 8367 31631
17/09/2012 8959 17989
17/09/2012 9292 35703
17/09/2012 9532 36542
17/09/2012 9705 27488
17/09/2012 10085 30849
17/09/2012 10220 32575
18/09/2012 2073 28293
18/09/2012 6078 27450
18/09/2012 7569 30323
18/09/2012 8239 38481
18 / 09/2012 8273 31154
18/09/2012 8367 27944
18/09/2012 8959 28196
18/09/2012 9292 30844
18/09/2012 9532 33128
18/09/2012 9705 32100
19/09/2012 2073 28227
19/09/2012 6078 32243
19/09/2012 7569 29041
19 / 09/2012 8239 42791
19/09/2012 8273 30966
19/09/2012 8367 26420
19/09/2012 8959 29394
19/09/2012 9292 14865
19 / 09/2012 9532 23618
19/09/2012 10085 31614
19/09/2012 10220 8686
20/09/2012 2073 28260
20/09/2012 6078 30446
20/09/2012 7097 34909
20/09/2012 7569 30869
20/09/2012 8273 31079
20/09/2012 8367 30162
20 / 09/2012 9292 13104
20/09/2012 9532 36614
20/09/2012 9705 35617
20/09/2012 10085 31821
20/09/2012 10220 30055
20/09/2012 14573 468
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09 / 2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 3
21/09/2012 0 0
21/09/2012 0 0
21/09 / 2012 0 3
21/09/2012 2073 28308
21/09/2012 6078 33833
21/09/2012 7569 32335
21/09/2 012 9292 33824
21/09/2012 9532 33376
21/09/2012 10220 21002
22/09/2012 2073 28402
23/09/2012 2073 28109
24/09/2012 2073 28431
24/09/2012 6078 30027
24/09/2012 7097 31914
24/09/2012 8239 35617
24/09/2012 8273 30670
24/09/2012 8367 29084
24/09/2012 8959 31023
24/09/2012 9292 34394
24/09/2012 9532 31255
24/09/2012 9705 18758
24/09/2012 10085 29290
24/09/2012 10220 33230
25/09/2012 2073 28506
25/09/2012 6078 32043
25/09/2012 7042 34953
25/09/2012 7569 30898
25/09/2012 8239 41297
25/09/2012 8273 31012
25 / 09/2012 8367 29645
25/09/2012 8959 29904
25/09/2012 9532 37875
25/09/2012 9705 13280
25/09/2012 10085 35023
25/09/2012 10220 31359
26/09/2012 2073 28388
26/09/2012 6078 29765
26/09/2012 7097 31561
26 / 09/2012 756 9 29151
26/09/2012 8239 40369
26/09/2012 8367 28174
26/09/2012 8959 26554
26/09/2012 9292 32104
26/09/2012 9532 33194
26/09/2012 9705 30377
26/09/2012 10085 31503
26/09/2012 10220 28310
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27 / 09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 2073 28491
27/09/2012 6078 31137
27/09/2012 8239 38403
27/09/2012 8273 31117
27/09/2012 8367 28462
27/09/2012 9292 32387
27 / 09/2012 9532 23023
27/09/2012 9705 32790
27/09/2012 10085 33460
27/09/2012 10220 31782
28/09/2012 0 161
28/09/2012 2073 28381
28/09/2012 7569 32322
28/09/2012 8239 38362
28/09/2012 8273 30533
28/09 / 2012 8959 17128
28/09/2012 9292 32484
28/09/2012 9532 18586
28/09/2012 9705 27902
29/09/2012 2073 28583
- 以上是一个具有百万条记录的数据框的示例
- 如何按周或月份和总和秒列按成本进行切片或分组中心。 *
- 我已经阅读/尝试了通过搜索
列出的本网站上显示的30条文章项目熊猫,python,groupby,分裂,数据框,周,成功。 - 我使用的是python 2.7和pandas 0.9。
- 我已经阅读了pandas 0.9教程中的时间序列/日期功能部分, t
使数据帧工作。我想使用其中的功能,例如商业周
预期输出
DateOccurred CostCentre TimeDifference
/ pre>
2012-03-11 0 500000
2012-03-11 2073 570000
2012- 03-18 0 650000
2012-03-18 2073 425000
2012-03-25 0 378000
2012-04-25 2073 480000
解决方案首先可以按CostCentre分组,然后使用Series / DataFrame
resample()
?在[72]中:中心= {}
在[73] :for center,idx in df.groupby(CostCentre)。groups.iteritems():
....:timediff = df.ix [idx] .set_index(Date)['TimeDifference']
....:centers [center] = timediff.resample(W,how = sum)
在[77]中:pd.concat(centers,names = ['CostCentre '])
Out [77]:
CostCentre Date
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09- 09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09 -16 160871
2012-09-23 153976
2012-09-30 122972
其他细节
当
parse_dates
是True c code code code code code code $ c $ p>
在[28]中:df = pd.read_clipboard(sep ='+',parse_dates = True,index_col = 0,
....:dayfirst = True)
在[30]中:df.head()
输出[30]:
CostCentre TimeDifference
DateOccurred
2012-09-03 2073 28138
2012-09-03 6078 34844
2012-09-03 8273 31215
2012-09-03 8367 28160
2012-09-03 8959 32037
由于resample()需要一个TimeSeries索引的帧/系列,所以在创建过程中设置索引不需要设置索引每个组别。 GroupBy对象也有一个apply方法,它基本上是与pd.concat()完成的combine步骤中的句法糖。
In [37]:x = df.groupby(CostCentre)。apply(lambda df:
....:df ['TimeDifference']。resample(W,how = sum))
在[38]:x.head(12)
出[38]:
CostCentre DateOccurred
0 2012-09-09 0
2012 -09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972
DateOccurred CostCentre TimeDifference
03/09/2012 2073 28138
03/09/2012 6078 34844
03/09/2012 8273 31215
03/09/2012 8367 28160
03/09/2012 8959 32037
03/09/2012 9292 30118
03/09/2012 9532 34200
03/09/2012 9705 27240
03/09/2012 10085 31431
03/09/2012 10220 22555
04/09/2012 6078 41126
04/09/2012 7569 31101
04/09/2012 8273 30994
04/09/2012 8959 30064
04/09/2012 9532 34655
04/09/2012 9705 26475
04/09/2012 10085 31443
04/09/2012 10220 33970
05/09/2012 2073 28221
05/09/2012 6078 27894
05/09/2012 7569 29012
05/09/2012 8239 42208
05/09/2012 8273 31128
05/09/2012 8367 27993
05/09/2012 8959 20669
05/09/2012 9292 33070
05/09/2012 9532 8189
05/09/2012 9705 27540
05/09/2012 10085 28798
05/09/2012 10220 23164
06/09/2012 2073 28350
06/09/2012 6078 35648
06/09/2012 7042 27129
06/09/2012 7569 31546
06/09/2012 8239 39945
06/09/2012 8273 31107
06/09/2012 8367 27795
06/09/2012 9292 32974
06/09/2012 9532 30320
06/09/2012 9705 37462
06/09/2012 10085 31703
06/09/2012 10220 7807
06/09/2012 14573 186
07/09/2012 0 0
07/09/2012 0 0
07/09/2012 2073 28036
07/09/2012 6078 31969
07/09/2012 7569 32941
07/09/2012 8273 30073
07/09/2012 8367 29391
07/09/2012 9292 31927
07/09/2012 9532 30127
07/09/2012 9705 27604
07/09/2012 10085 28108
08/09/2012 2073 28463
10/09/2012 6078 31266
10/09/2012 8239 16390
10/09/2012 8273 31140
10/09/2012 8959 30858
10/09/2012 9532 30794
10/09/2012 9705 28752
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 0 0
11/09/2012 2073 28159
11/09/2012 6078 36835
11/09/2012 8239 45354
11/09/2012 8273 30922
11/09/2012 8367 31382
11/09/2012 8959 29670
11/09/2012 9292 33582
11/09/2012 9705 29394
11/09/2012 10085 17140
12/09/2012 2073 28283
12/09/2012 6078 31139
12/09/2012 7042 35063
12/09/2012 8273 31075
12/09/2012 8367 29795
12/09/2012 9292 33496
12/09/2012 9532 31669
12/09/2012 9705 26166
12/09/2012 10085 29889
12/09/2012 10220 35656
13/09/2012 2073 28144
13/09/2012 6078 30544
13/09/2012 7097 30866
13/09/2012 8273 30772
13/09/2012 8367 32387
13/09/2012 8959 29307
13/09/2012 9292 32348
13/09/2012 9532 28137
13/09/2012 9705 28823
13/09/2012 10085 31543
13/09/2012 10220 28293
14/09/2012 0 12433
14/09/2012 0 12434
14/09/2012 0 12434
14/09/2012 0 12434
14/09/2012 0 12434
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 12433
14/09/2012 0 0
14/09/2012 0 12433
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 1720
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 384
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 0
14/09/2012 0 383
14/09/2012 2073 28438
14/09/2012 6078 27255
14/09/2012 8273 29989
14/09/2012 8959 26892
14/09/2012 9292 33202
14/09/2012 9532 30862
14/09/2012 9705 26857
14/09/2012 10085 32657
14/09/2012 10220 27296
15/09/2012 6078 3832
17/09/2012 6078 30004
17/09/2012 7569 30390
17/09/2012 8239 41421
17/09/2012 8273 26337
17/09/2012 8367 31631
17/09/2012 8959 17989
17/09/2012 9292 35703
17/09/2012 9532 36542
17/09/2012 9705 27488
17/09/2012 10085 30849
17/09/2012 10220 32575
18/09/2012 2073 28293
18/09/2012 6078 27450
18/09/2012 7569 30323
18/09/2012 8239 38481
18/09/2012 8273 31154
18/09/2012 8367 27944
18/09/2012 8959 28196
18/09/2012 9292 30844
18/09/2012 9532 33128
18/09/2012 9705 32100
19/09/2012 2073 28227
19/09/2012 6078 32243
19/09/2012 7569 29041
19/09/2012 8239 42791
19/09/2012 8273 30966
19/09/2012 8367 26420
19/09/2012 8959 29394
19/09/2012 9292 14865
19/09/2012 9532 23618
19/09/2012 10085 31614
19/09/2012 10220 8686
20/09/2012 2073 28260
20/09/2012 6078 30446
20/09/2012 7097 34909
20/09/2012 7569 30869
20/09/2012 8273 31079
20/09/2012 8367 30162
20/09/2012 9292 13104
20/09/2012 9532 36614
20/09/2012 9705 35617
20/09/2012 10085 31821
20/09/2012 10220 30055
20/09/2012 14573 468
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 3
21/09/2012 0 0
21/09/2012 0 0
21/09/2012 0 3
21/09/2012 2073 28308
21/09/2012 6078 33833
21/09/2012 7569 32335
21/09/2012 9292 33824
21/09/2012 9532 33376
21/09/2012 10220 21002
22/09/2012 2073 28402
23/09/2012 2073 28109
24/09/2012 2073 28431
24/09/2012 6078 30027
24/09/2012 7097 31914
24/09/2012 8239 35617
24/09/2012 8273 30670
24/09/2012 8367 29084
24/09/2012 8959 31023
24/09/2012 9292 34394
24/09/2012 9532 31255
24/09/2012 9705 18758
24/09/2012 10085 29290
24/09/2012 10220 33230
25/09/2012 2073 28506
25/09/2012 6078 32043
25/09/2012 7042 34953
25/09/2012 7569 30898
25/09/2012 8239 41297
25/09/2012 8273 31012
25/09/2012 8367 29645
25/09/2012 8959 29904
25/09/2012 9532 37875
25/09/2012 9705 13280
25/09/2012 10085 35023
25/09/2012 10220 31359
26/09/2012 2073 28388
26/09/2012 6078 29765
26/09/2012 7097 31561
26/09/2012 7569 29151
26/09/2012 8239 40369
26/09/2012 8367 28174
26/09/2012 8959 26554
26/09/2012 9292 32104
26/09/2012 9532 33194
26/09/2012 9705 30377
26/09/2012 10085 31503
26/09/2012 10220 28310
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 0 0
27/09/2012 2073 28491
27/09/2012 6078 31137
27/09/2012 8239 38403
27/09/2012 8273 31117
27/09/2012 8367 28462
27/09/2012 9292 32387
27/09/2012 9532 23023
27/09/2012 9705 32790
27/09/2012 10085 33460
27/09/2012 10220 31782
28/09/2012 0 161
28/09/2012 2073 28381
28/09/2012 7569 32322
28/09/2012 8239 38362
28/09/2012 8273 30533
28/09/2012 8959 17128
28/09/2012 9292 32484
28/09/2012 9532 18586
28/09/2012 9705 27902
29/09/2012 2073 28583
- Above is a sample of a dataframe which has a million records
- How can I slice or group it by Week or Month and sum seconds column by cost centre.?*
- I have read/tried 30 of the articles on this site which appear by doing a search for
List item pandas, python, groupby, split, dataframe, week with out success. - I am using python 2.7 and pandas 0.9.
- I've read the Time Series / Date functionality section in the pandas 0.9 tutorial but couldn't make anything work with a dataframe. I would like to use the features in there such as Business week
Expected Output
DateOccurred CostCentre TimeDifference
2012-03-11 0 500000
2012-03-11 2073 570000
2012-03-18 0 650000
2012-03-18 2073 425000
2012-03-25 0 378000
2012-04-25 2073 480000
Perhaps group by CostCentre first, then use Series/DataFrame resample()
?
In [72]: centers = {}
In [73]: for center, idx in df.groupby("CostCentre").groups.iteritems():
....: timediff = df.ix[idx].set_index("Date")['TimeDifference']
....: centers[center] = timediff.resample("W", how=sum)
In [77]: pd.concat(centers, names=['CostCentre'])
Out[77]:
CostCentre Date
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972
Additional details:
When parse_dates
is True
for the pd.read_* functions, index_col
must also be set.
In [28]: df = pd.read_clipboard(sep=' +', parse_dates=True, index_col=0,
....: dayfirst=True)
In [30]: df.head()
Out[30]:
CostCentre TimeDifference
DateOccurred
2012-09-03 2073 28138
2012-09-03 6078 34844
2012-09-03 8273 31215
2012-09-03 8367 28160
2012-09-03 8959 32037
Since resample() requires a TimeSeries-indexed frame/series, setting the index during creation eliminates the need to set the index for each group individually. GroupBy objects also have an apply method, which is basically syntactic sugar around the "combine" step done with pd.concat() above.
In [37]: x = df.groupby("CostCentre").apply(lambda df:
....: df['TimeDifference'].resample("W", how=sum))
In [38]: x.head(12)
Out[38]:
CostCentre DateOccurred
0 2012-09-09 0
2012-09-16 89522
2012-09-23 6
2012-09-30 161
2073 2012-09-09 141208
2012-09-16 113024
2012-09-23 169599
2012-09-30 170780
6078 2012-09-09 171481
2012-09-16 160871
2012-09-23 153976
2012-09-30 122972
这篇关于python按周或月分割大 pandas 数据框,并根据这些sp分组数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!