想要使用Groupby查找按年计算并申请各种年份 [英] Want to find Year on Year calculation using Groupby and apply for various years

查看:96
本文介绍了想要使用Groupby查找按年计算并申请各种年份的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,如下所示:

I have a dataframe as follows:

    MARKET     PRODUCT  TIMEPERIOD  DATE    VALUES
0   USA MARKET  APPLE   QUARTER 2020-06-01  100
1   USA MARKET  APPLE   YEARLY  2020-06-01  1000
2   USA MARKET  PEAR    QUARTER 2020-06-01  200
3   USA MARKET  PEAR    YEARLY  2020-06-01  5000
4   USA MARKET  APPLE   QUARTER 2019-06-01  300
5   USA MARKET  PEAR    YEARLY  2019-06-01  2000
6   USA MARKET  PEAR    QUARTER 2019-06-01  100
7   USA MARKET  PEAR    YEARLY  2019-06-01  3000
8   USA MARKET  APPLE   QUARTER 2018-06-01  300
9   USA MARKET  PEAR    YEARLY  2018-06-01  2000
10  USA MARKET  PEAR    QUARTER 2018-06-01  100
11  USA MARKET  PEAR    YEARLY  2018-06-01  3000
12  UK MARKET   WATERMELON  QUARTER 2020-06-01  200
13  UK MARKET   WATERMELON  YEARLY  2020-06-01  5000
14  UK MARKET   GRAPE   QUARTER 2020-06-01  200
15  UK MARKET   GRAPE   YEARLY  2020-06-01  5000
16  UK MARKET   WATERMELON  QUARTER 2019-06-01  500
17  UK MARKET   WATERMELON  YEARLY  2019-06-01  300
18  UK MARKET   GRAPE   QUARTER 2019-06-01  50
19  UK MARKET   GRAPE   YEARLY  2019-06-01  500
20  UK MARKET   WATERMELON  QUARTER 2018-06-01  500
21  UK MARKET   WATERMELON  YEARLY  2018-06-01  300
22  UK MARKET   GRAPE   QUARTER 2018-06-01  50
23  UK MARKET   GRAPE   YEARLY  2018-06-01  500

我想找到每个市场在每个时间段内每种产品的同比差异(这是一个令人满口的!)例如,对于TIMEPERIOD季度美国市场中的APPLE产品,2020-06-01的增长率为简单地(100-300)/300 = 66.6%,其中我使用的是2020-06-01减去2019-06-01的值除以2019-06-01.

I want to find the Year on Year difference of each product in each market for each timeperiod (that's a mouthful!) For example, for product APPLE in USA MARKET during the TIMEPERIOD Quarter, the 2020-06-01 growth rate is simply (100-300)/300 = 66.6% where I used the values of 2020-06-01 minus 2019-06-01 divided by 2019-06-01.

如您所见,以下代码的问题在于,它仅返回当年与过去一年的增长率.并且忽略了过去的一年2019年对2018年.我已经尝试了几个if-else块,但似乎都指出了一些错误,如果有解决这个问题的灵巧方法,将不胜感激.简而言之,我的growth_rate_prev在这里未使用(尽管我确实尝试将其编织,但失败了.)

As you can see, the problem with the below code is that it only returns the growth rate of the current year vs the past year. And ignores the past year 2019 vs 2018. I have tried a few if-else blocks, but all seem to point to some errors, would appreciate it if there are any neat solutions to resolve this. In short, my growth_rate_prev is unused here (although I did try to weave it in but it failed).

def year_on_year(df):    
    try:
        curr_year_val = df[df['DATE']==max(df['DATE'])]['VALUES'].sum() 
        prev_year_val = df[df['DATE']==(max(df['DATE'])-relativedelta(months=12))]['VALUES'].sum()
        prev_prev_year_val = df[df['DATE']==(df(df['DATE'])-relativedelta(months=24))]['VALUES'].sum()
        
        growth_rate_curr = ((curr_year_val)-(prev_year_val))/(prev_year_val)
        growth_rate_prev = ((prev_year_val)-(prev_prev_year_val))/(prev_prev_year_val)
        
        
    except ZeroDivisionError:
        growth_rate_curr, growth_rate_prev = 0 , 0

        
    return growth_rate_curr


    
def product_growth(applied_group_df):            
        applied_group_df['Year on Year difference'] = year_on_year(applied_group_df)
        return applied_group_df

growth_rate_df = df_2.groupby(["TIMEPERIOD",'MARKET', 'PRODUCT']).apply(product_growth) 

如果任何人都想复制代码,则可以使用以下代码创建df:

In case anyone wants to reproduce the code, you can create the df by using the code below:

df_list_for_yoy = [['USA MARKET', 'APPLE', 'QUARTER', '2020-06-01', 100], ['USA MARKET', 'APPLE', 'YEARLY', '2020-06-01', 1000],
           ['USA MARKET', 'PEAR', 'QUARTER', '2020-06-01', 200],  ['USA MARKET', 'PEAR', 'YEARLY', '2020-06-01', 5000], 
           ['USA MARKET', 'APPLE', 'QUARTER', '2019-06-01', 300],  ['USA MARKET', 'APPLE', 'YEARLY', '2019-06-01', 2000],
           ['USA MARKET', 'PEAR', 'QUARTER', '2019-06-01', 100],  ['USA MARKET', 'PEAR', 'YEARLY', '2019-06-01', 3000],
           ['USA MARKET', 'APPLE', 'QUARTER', '2018-06-01', 300],  ['USA MARKET', 'APPLE', 'YEARLY', '2018-06-01', 2000],
           ['USA MARKET', 'PEAR', 'QUARTER', '2018-06-01', 100],  ['USA MARKET', 'PEAR', 'YEARLY', '2018-06-01', 3000],
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2020-06-01', 200],  ['UK MARKET', 'WATERMELON', 'YEARLY', '2020-06-01', 5000], 
           ['UK MARKET', 'GRAPE', 'QUARTER', '2020-06-01', 200],    ['UK MARKET', 'GRAPE', 'YEARLY', '2020-06-01', 5000],
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2019-06-01', 500],  ['UK MARKET', 'WATERMELON', 'YEARLY', '2019-06-01', 300], 
           ['UK MARKET', 'GRAPE', 'QUARTER', '2019-06-01', 50],    ['UK MARKET', 'GRAPE', 'YEARLY', '2019-06-01', 500],
           ['UK MARKET', 'WATERMELON', 'QUARTER', '2018-06-01', 500],  ['UK MARKET', 'WATERMELON', 'YEARLY', '2018-06-01', 300], 
           ['UK MARKET', 'GRAPE', 'QUARTER', '2018-06-01', 50],    ['UK MARKET', 'GRAPE', 'YEARLY', '2018-06-01', 500]]

column_names = ['MARKET', 'PRODUCT', 'TIMEPERIOD', 'DATE', 'VALUES']
df_2 = pd.DataFrame(df_list_for_yoy, columns = column_names)
df_2['DATE']= pd.to_datetime(df_2['DATE'])

推荐答案

您可以使用itertools.combinations来获取年份-年份组合,以及在要应用于组的函数中进行进一步的操作,如下所示:

You can use itertools.combinations to get the year-year combination, together with further manipulation inside a function to be applied in the groups, like this:

import numpy as np
import pandas as pd
from itertools import combinations

def get_annual_growth(grp):
    # Get all possible combination of the years from dataset
    year_comb_lists = np.sort([sorted(comb) for comb in combinations(grp.DATE.dt.year, 2)])
    # Remove those combinations in which difference is greather than 1 (for example, 2018-2020)
    year_comb_lists = year_comb_lists[(np.diff(year_comb_lists) == 1).flatten()] # comment this line if it's not the case
    # Get year-combination labels
    year_comb_strings = ['-'.join(map(str, comb)) for comb in year_comb_lists]
    
    # Create sub-dataframe with to be concated afterwards by pandas `groupby`
    subdf = pd.DataFrame(columns=['Annual Reference', 'Annual Growth (%)'])
    for i,years in enumerate(year_comb_lists): # for each year combination ...
        actual_value, last_value = grp[grp.DATE.dt.year==years[1]].VALUES.mean(), grp[grp.DATE.dt.year==years[0]].VALUES.mean()
        growth = (actual_value - last_value) / last_value # calculate the annual growth
        subdf.loc[i, :] = [year_comb_strings[i], growth] 
    return subdf

df_2.groupby(['TIMEPERIOD','MARKET', 'PRODUCT']).apply(get_annual_growth)

输出:

                                   Annual Reference Annual Growth (%)
TIMEPERIOD MARKET     PRODUCT                                        
QUARTER    UK MARKET  GRAPE      0        2019-2020               300
                                 1        2018-2019                 0
                      WATERMELON 0        2019-2020               -60
                                 1        2018-2019                 0
           USA MARKET APPLE      0        2019-2020            -66.67
                                 1        2018-2019                 0
                      PEAR       0        2019-2020               100
                                 1        2018-2019                 0
YEARLY     UK MARKET  GRAPE      0        2019-2020               900
                                 1        2018-2019                 0
                      WATERMELON 0        2019-2020           1566.67
                                 1        2018-2019                 0
           USA MARKET APPLE      0        2019-2020               -50
                                 1        2018-2019                 0
                      PEAR       0        2019-2020             66.67
                                 1        2018-2019                 0

这篇关于想要使用Groupby查找按年计算并申请各种年份的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆