如何为groupby DataFrame创建滚动百分比 [英] How to create rolling percentage for groupby DataFrame

查看:178
本文介绍了如何为groupby DataFrame创建滚动百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试计算每种产品每月的变化百分比.到目前为止,这就是我所拥有的.我正在为涉及单个产品的DataFrame工作.我很困惑如何将计算应用于包含许多产品和许多个月的结果集.

I am trying to calculate the percent change by month for each product. Here is what I have so far. I have this working for a DataFrame involving a single product. I am stumped on how to get the calculation applied to a result set that contains many products and many months.

示例数据框:

product_desc    activity_month    prod_count
product_a       1/1/2014          53
product_b       1/1/2014          42
product_c       1/1/2014          38
product_a       2/1/2014          26
product_b       2/1/2014          48
product_c       2/1/2014          39
product_a       3/1/2014          41
product_b       3/1/2014          35
product_c       3/1/2014          50

我需要弄清楚的是添加了product_desc按月变化百分比的数据框:

What I need to get out is the dataframe with a percentage change by product_desc by month added to it:

product_desc    activity_month   prod_count pct_change
product_a       1/1/2014         53 
product_a       2/1/2014         26         0.490566038
product_a       3/1/2014         41         1.576923077
product_b       1/1/2014         42 
product_b       2/1/2014         48         1.142857143
product_b       3/1/2014         35         0.729166667
product_c       1/1/2014         38 
product_c       2/1/2014         39         1.026315789
product_c       3/1/2014         50         1.282051282

我可以在一个具有单个product_desc的数据帧上对此进行计算:

I can calculate this on a dataframe with a single product_desc with this:

df['change_rate1'] = df['prod_count'].shift(-1)/df['prod_count']
df['pct_change'] = df['change_rate1'].shift(1)
df = df.drop('change_rate1',1)

这是我现在正在尝试的:

Here is what I am trying now:

df_grouped = df.groupby(['product_desc','activity_month'])

for product_desc, activity_month in df_grouped:
   df['change_rate1'] = df_grouped['prod_count'].shift(-1)/df_grouped['prod_count']

但是,我在for语句的最后一行返回了"NotImplementedError".

However, I get back a 'NotImplementedError' on the last line in the for statement.

任何有关如何正确计算此值的建议,我们都会感激不尽.

Any advice on how to get this calculated correctly is appreciated.

推荐答案

在小组中,每个月都有一个观察值,而您希望百分比从一个月变化到下一个月.您可以使用groupby/apply通过对"product_desc"进行分组,然后使用内置的pct_change()方法来做到这一点:

Well it looks like within groups, there is one observation per month and you want the percent change from one month to the next. You can do that with a groupby/apply by grouping on 'product_desc' and then using the built in pct_change() method:

>>> df['pct_ch'] = df.groupby('product_desc')['prod_count'].pct_change() + 1

请注意,我在pct_change()方法中添加了1,因为它计算了净百分比变化.我将打印出已排序的版本,使其与您的预期输出匹配:

Note, I added 1 to the pct_change() method because it computes the net percent change. I'll print out a sorted version so it matches your expected output:

>>> df.sort('product_desc')

  product_desc activity_month  prod_count    pct_ch
0    product_a     2014-01-01          53       NaN
3    product_a     2014-02-01          26  0.490566
6    product_a     2014-03-01          41  1.576923
1    product_b     2014-01-01          42       NaN
4    product_b     2014-02-01          48  1.142857
7    product_b     2014-03-01          35  0.729167
2    product_c     2014-01-01          38       NaN
5    product_c     2014-02-01          39  1.026316
8    product_c     2014-03-01          50  1.282051

pandas的旧版本上,您可能需要执行以下操作:

On older versions of pandas you might have to do:

>>> df['pct_ch'] = df.groupby('product_desc')['prod_count'].apply(lambda x: x.pct_change() + 1)

或者您可以根据需要使用shift进行一些修改:

Or you could use shift as you suggest with a small modification:

>>> df['pct_ch'] = df['prod_count'] / df.groupby('product_desc')['prod_count'].shift(1)
>>> df.sort('product_desc')

  product_desc activity_month  prod_count    pct_ch
0    product_a     2014-01-01          53       NaN
3    product_a     2014-02-01          26  0.490566
6    product_a     2014-03-01          41  1.576923
1    product_b     2014-01-01          42       NaN
4    product_b     2014-02-01          48  1.142857
7    product_b     2014-03-01          35  0.729167
2    product_c     2014-01-01          38       NaN
5    product_c     2014-02-01          39  1.026316
8    product_c     2014-03-01          50  1.282051

您无需在groupby中引用df['prod_count'],您无需对该列做任何事情.

You don't need to refer to df['prod_count'] within a groupby, you're not doing anything to that column.

pandas的旧版本上,您可能需要执行以下操作:

On older versions of pandas you might have to do:

>>> df['pct_ch'] = df.groupby('product_desc')['prod_count'].apply(lambda x: x/x.shift(1))

这篇关于如何为groupby DataFrame创建滚动百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆