按 pandas 分组计算逐年增长 [英] Calculating year over year growth by group in Pandas

查看:139
本文介绍了按 pandas 分组计算逐年增长的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下的 dataframe

  In [1] :df 
Out [1]:

ID月份Transaction_Amount

1 2013/01 10
1 2013/02 20
1 2013 / 03 10
1 2013/04 20
1 2013/05 10
1 2013/06 20
1 2013/07 10
1 2013/08 20
1 2013/09 10
1 2013/10 20
1 2013/11 10
1 2013/12 20
1 2014/01 15
1 2014 / 02 25
1 2014/03 15
1 2014/04 25
...
1 2014/11 15
1 2014/12 25
。 ..
10000000 2014/11 13
10000000 2014/12 23

什么我想d o是计算滚动月份期间同比增长的增长率,例如,我希望找到(2014/01 - 2013/01)/(2014/01) (15 - 10)/(15)= 1/3 ,并在第一个滚动期间保存。每个 ID 将总共有12个滚动周期。我认为最终的输出应该如下所示:

  In [2]:df_new 
Out [2] :

ID rolling_period_1 rolling_period_2 ... rolling_period_12

1.333333 .25 .25
2 x1 x2 x12
3 y1 y2 y12
4 z1 z2 z12
...

我生成了一个包含每年元组的列表超过期限 [(2013/01,2014/01),(2013/02,2014/02)...(2013/12,2014/12)] 并且拥有一直在玩 isin 来索引原始 df 的一个子集,但我不确定如何到达 df_new



编辑

我使用以下代码创建了一个名为 temp_df 的新数据框:

 在[4]中:temp_df = df [df ['month']。isin(('2013/01 ,'2014/01'))] 

在[5]中:temp_df
输出[5]:

ID月份Transaction_Amount

1 2013/01 10
1 2014/01 15
2 2013/01 20
2 2014/01 30
3 2013/01 15
3 2014/01 30
...

我想制作的是 DataFrame ,如下所示:

 在[6]中:new_df 
Out [6]:

ID交易增长

1.3333#(15-10)/ 15
2 .3333#(30-20)/ 30
3.50#(30-15)/ 30
...


解决方案

您可以使用shift来偏移数据框中的行。

创建具有月份列和值列的虚拟数据

b
$ b

  rng = pd.date_range('1/1/2011',句号= 90,freq ='M' )
df = pd.DataFrame({'value':range(1,91),'date':rng})

设置月份列索引

  df = df.set_index('date')

移动数据框12个周期的副本以获取12个月前的值,减去当前记录,除以当前记录:

  df  -  df.shift(12)/ df 

更新换班功能期间的符号



更新考虑ID

 #创建月份范围
rng = pd.date_range ('1/1/2011',句号= 180,freq ='M')
ID = np.array([1,2,3])

#创建ID列
ID = np.repeat(ID,60)

#在数据框中创建虚拟数据
df = pd.DataFrame({'ID':ID,'value':range 1,181),'date':rng})

#在对象组中使用shift对象
(df.value - df.groupby(['ID']).value.shift 12))/ df.value


I have the following dataframe:

In [1]: df
Out[1]: 

ID            Month           Transaction_Amount

1             2013/01         10
1             2013/02         20
1             2013/03         10
1             2013/04         20
1             2013/05         10
1             2013/06         20
1             2013/07         10
1             2013/08         20
1             2013/09         10
1             2013/10         20
1             2013/11         10
1             2013/12         20
1             2014/01         15
1             2014/02         25
1             2014/03         15
1             2014/04         25
...
1             2014/11         15
1             2014/12         25
...
10000000      2014/11         13
10000000      2014/12         23

What I would like to do is calculate the growth over rolling month periods year over year, so for example, I would want to find the value of (2014/01 - 2013/01) / (2014/01) which is (15 - 10) / (15) = 1/3 and save this for the first rolling period. There will be a total of 12 rolling periods for each ID. I'm thinking that the final output should look like:

In [2]: df_new
Out[2]: 

ID       rolling_period_1   rolling_period_2  ... rolling_period_12

1        .333333            .25                   .25
2        x1                 x2                    x12
3        y1                 y2                    y12
4        z1                 z2                    z12
...

I generated a list containing tuples of every year over period [(2013/01, 2014/01), (2013/02, 2014/02) ... (2013/12, 2014/12)] and have been playing around with isin to index a subset of the original df, but I am unsure how to arrive at the df_new.

EDIT

I have created a new dataframe called temp_df with the following code:

In [4]: temp_df = df[df['month'].isin(('2013/01','2014/01'))]

In [5]: temp_df
Out[5]:

ID            Month           Transaction_Amount

1             2013/01         10
1             2014/01         15
2             2013/01         20
2             2014/01         30
3             2013/01         15
3             2014/01         30
...

What I would like to produce is a DataFrame that looks like the following:

In [6]: new_df
Out[6]:

ID            Transaction_Growth

1             .3333   # (15-10)/15
2             .3333   # (30-20)/30
3             .50     # (30-15)/30
...

解决方案

you can use shift to offset the rows in the dataframe.

Create dummy data with month column and values column

rng = pd.date_range('1/1/2011', periods=90, freq='M')
df = pd.DataFrame({'value':range(1,91),'date':rng})

set the month column to index

df = df.set_index('date')

shift a copy of the dataframe 12 periods to get the value 12 months ago, minus from the current record, and divide by current record:

df - df.shift(12)/ df

update the sign on the period in the shift function

Updated to consider ID

# Create range of months
rng = pd.date_range('1/1/2011', periods=180, freq='M')
ID = np.array([1,2,3])

# Create ID column
ID = np.repeat(ID,60)

# Create dummy data in dataframe
df = pd.DataFrame({'ID':ID,'value':range(1,181),'date':rng})

# Use shift on a group by object
(df.value - df.groupby(['ID']).value.shift(12))/ df.value

这篇关于按 pandas 分组计算逐年增长的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆