与Pandas中的groupby一起使用的TimeSeries [英] TimeSeries with a groupby in Pandas
问题描述
我想查看Pandas
中各个时间段内每个客户端的TimeSeries
数据.
I would like to look at TimeSeries
data for every client over various time periods in Pandas
.
import pandas as pd
import numpy as np
import random
clients = np.random.randint(1, 11, size=100)
dates = pd.date_range('20130101',periods=365)
OrderDates = random.sample(dates,100)
Values = np.random.randint(10, 250, size=100)
df = pd.DataFrame({ 'Client' : clients,'OrderDate' : OrderDates, 'Value' : Values})
df = df.sort(['OrderDate', 'Client'], ascending=['True', 'True'])
df.head()
我要完成的工作是获取值"列的计数和总和,并按客户"在不同时间段(每月,每季度,每年)进行分组-我可能会为此数据建立3个不同的数据框,然后将数据框设置为宽").
What I am trying to accomplish is to get the count and the sum of the 'Value' column, grouped by 'Client' for various time periods (Monthly, Quarterly, Yearly - I will likely build 3 different dataframes for this data, then make the dataframes 'wide').
对于《季度》,我会期望像这样:
For Quarterly, I would expect something like this:
Client OrderDate NumberofEntries SumofValues
1 2013-03-31 7 28
1 2013-06-30 2 7
1 2013-09-30 6 20
1 2013-12-31 1 3
2 2013-03-31 1 4
2 2013-06-30 2 8
2 2013-09-30 3 17
2 2013-12-31 4 24
我可以通过获取每个条目的季度(或月份或年份)来追加数据框,然后使用Pandas
groupby
函数,但是当我应该使用TimeSeries
时,这似乎是额外的工作.
I could append that data frame by getting the quarter for each entry (or Month, or Year), then use Pandas
groupby
function, but that seems like it's extra work when I should be using TimeSeries
.
我已经阅读了文档并查看了Wes的TimeSeries
演示,但是我没有看到为客户端执行groupby
的方法,然后在我尝试的时间段内执行TimeSeries
进行构建(或者-我可以运行for loop
并以此方式构建数据框,但是再次-看来这比应该做的工作还要多.)
I've read the documentation and reviewed a TimeSeries
demonstration by Wes, but I don't see a way to do a groupby
for the Client, then perform the TimeSeries
over the time periods I am trying to build (Alternatively - I could run a for loop
and build the dataframe that way, but again - seems like that's more work than there should be.)
是否可以将groupby
进程与TimeSeries
组合在一起?
Is there a way to combine a groupby
process with TimeSeries
?
推荐答案
在进行分组方式之前,还有一个替代方法是set_index
:
A slight alternative is to set_index
before doing the groupby:
In [11]: df.set_index('OrderDate', inplace=True)
In [12]: g = df.groupby('Client')
In [13]: g['Value'].resample('Q', how=[np.sum, len])
Out[13]:
sum len
Client OrderDate
1 2013-03-31 239 1
2013-06-30 83 1
2013-09-30 249 2
2013-12-31 506 3
2 2013-03-31 581 4
2013-06-30 569 4
2013-09-30 316 4
2013-12-31 465 5
...
注意:您无需先进行排序.
Note: you don't need to do the sort before doing this.
这篇关于与Pandas中的groupby一起使用的TimeSeries的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!