与Pandas中的groupby一起使用的TimeSeries [英] TimeSeries with a groupby in Pandas

查看:63
本文介绍了与Pandas中的groupby一起使用的TimeSeries的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想查看Pandas中各个时间段内每个客户端的TimeSeries数据.

I would like to look at TimeSeries data for every client over various time periods in Pandas.

import pandas as pd
import numpy as np
import random
clients = np.random.randint(1, 11, size=100)
dates = pd.date_range('20130101',periods=365)
OrderDates = random.sample(dates,100)
Values = np.random.randint(10, 250, size=100)


df = pd.DataFrame({ 'Client' : clients,'OrderDate' : OrderDates, 'Value' : Values})

df = df.sort(['OrderDate', 'Client'], ascending=['True', 'True'])

df.head()

我要完成的工作是获取值"列的计数和总和,并按客户"在不同时间段(每月,每季度,每年)进行分组-我可能会为此数据建立3个不同的数据框,然后将数据框设置为宽").

What I am trying to accomplish is to get the count and the sum of the 'Value' column, grouped by 'Client' for various time periods (Monthly, Quarterly, Yearly - I will likely build 3 different dataframes for this data, then make the dataframes 'wide').

对于《季度》,我会期望像这样:

For Quarterly, I would expect something like this:

Client      OrderDate       NumberofEntries SumofValues
1           2013-03-31      7               28
1           2013-06-30      2               7
1           2013-09-30      6               20
1           2013-12-31      1               3
2           2013-03-31      1               4
2           2013-06-30      2               8
2           2013-09-30      3               17
2           2013-12-31      4               24

我可以通过获取每个条目的季度(或月份或年份)来追加数据框,然后使用Pandas groupby函数,但是当我应该使用TimeSeries时,这似乎是额外的工作.

I could append that data frame by getting the quarter for each entry (or Month, or Year), then use Pandas groupby function, but that seems like it's extra work when I should be using TimeSeries.

我已经阅读了文档并查看了Wes的TimeSeries演示,但是我没有看到为客户端执行groupby的方法,然后在我尝试的时间段内执行TimeSeries进行构建(或者-我可以运行for loop并以此方式构建数据框,但是再次-看来这比应该做的工作还要多.)

I've read the documentation and reviewed a TimeSeries demonstration by Wes, but I don't see a way to do a groupby for the Client, then perform the TimeSeries over the time periods I am trying to build (Alternatively - I could run a for loop and build the dataframe that way, but again - seems like that's more work than there should be.)

是否可以将groupby进程与TimeSeries组合在一起?

Is there a way to combine a groupby process with TimeSeries?

推荐答案

在进行分组方式之前,还有一个替代方法是set_index:

A slight alternative is to set_index before doing the groupby:

In [11]: df.set_index('OrderDate', inplace=True)

In [12]: g = df.groupby('Client')

In [13]: g['Value'].resample('Q', how=[np.sum, len])
Out[13]: 
                   sum  len
Client OrderDate           
1      2013-03-31  239    1
       2013-06-30   83    1
       2013-09-30  249    2
       2013-12-31  506    3
2      2013-03-31  581    4
       2013-06-30  569    4
       2013-09-30  316    4
       2013-12-31  465    5
...

注意:您无需先进行排序.

Note: you don't need to do the sort before doing this.

这篇关于与Pandas中的groupby一起使用的TimeSeries的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆