Pandas Group By with Running Total [英] Pandas Group By With Running Total

查看:46
本文介绍了Pandas Group By with Running Total的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我奶奶有一些奇怪的想法.每个生日她都会带我去购物.她有一些严格的规定.如果我买一件低于 20 美元的礼物,她不会贡献任何东西.如果我花费超过 20 美元,她将最多贡献 30 美元.

My granny has some strange ideas. Every birthday she takes me shopping. She has some strict rules. If I buy a present less than $20 she wont contribute anything. If I spend over $20 she will contribute up to $30.

因此,如果礼物价值 27 美元,她将捐出 7 美元.

So if a present costs $27 she would contribute $7.

现在我还有 23 美元可以花在那个生日的额外礼物上;与上述相同的规则适用于任何额外的礼物.

That now leaves me with $23 to spend on extra presents that birthday; the same rules as above apply on any additional presents.

一旦花掉了 30 美元,奶奶就不再捐款了,剩下的我必须自己支付.

Once the $30 are spent there are no more contributions from granny and I must pay the rest myself.

这是我 11 岁、12 岁和 13 岁生日的示例表.

Here is an example table of my 11th, 12th and 13th birthday.

                         DollarsSpent  granny_pays
BirthDayAge PresentNum                           
11          1                  25.00         5.00   -- I used up $5
            2                 100.00        25.00   -- I used up last $20
            3                  10.00         0.00
            4                  50.00         0.00
12          1                  39.00        19.00  -- I used up $19 only $11 left
            2                   7.00         0.00
            3                  32.00        11.00  -- I used up the last $11 despite $12 of $32 above the $20 starting point
            4                  19.00         0.00
13          1                  21.00         1.00  -- used up $1
            2                  27.00         7.00  -- used up $7, total used up $8 and never spent last $22

所以在熊猫中我已经做到了这一点.

So in pandas I have gotten this far.

import pandas as pd
granny_wont_pay_first = 20.
granny_limit = 30.

df = pd.DataFrame({'BirthDayAge' : ['11','11','11','11','12','12','12','12','13','13']
                   ,'PresentNum' : [1,2,3,4,1,2,3,4,1,2]
                   ,'DollarsSpent' : [25.,100.,10.,50.,39.,7.,32.,19.,21.,27.]
              })

df.set_index(['BirthDayAge','PresentNum'],inplace=True)
df['granny_pays'] = df['DollarsSpent'] - granny_wont_pay_first
df['granny_limit'] = granny_limit
df['zero'] = 0.0
df['granny_pays'] =  df[['granny_pays','zero','granny_limit']].apply(np.median,axis=1)
df.drop(['granny_limit','zero'], axis=1, inplace=True)

print df.head(len(df))

这是输出.使用 3 个数字的中位数是计算奶奶会贡献什么的好方法.

And this is the output. Using the median on the 3 numbers is a nice way to work out what granny will contribute.

问题是你可以看到每件礼物都是单独处理的,我没有正确地在每个 BirthDayAge 内消耗我每件 30 美元的礼物.

The problem is that you can see each present is treated in isolation and I don't correctly erode my $30 each present within each BirthDayAge.

                      DollarsSpent  granny_pays
BirthDayAge PresentNum                           
11          1                  25.00         5.00
            2                 100.00        30.00 -- should be 25.0
            3                  10.00         0.00
            4                  50.00        30.00 -- should be 0.0
12          1                  39.00        19.00
            2                   7.00         0.00
            3                  32.00        12.00 -- should be 11.0
            4                  19.00         0.00
13          1                  21.00         1.00
            2                  27.00         7.00

试图想出一种不错的熊猫方式来进行这种侵蚀.

Trying to think of a nice pandas way to do this erosion.

希望没有循环.

推荐答案

我不知道是否有更简洁的方法,但这应该有效并且确实可以避免循环.

I don't know if there is a more concise way, but this should work and does avoid loops as requested.

df['per_gift'] = df.DollarsSpent - 20
df['per_gift'] = np.where( df.per_gift > 0, df.per_gift, 0 )

df['per_bday'] = df.groupby('BirthDayAge').per_gift.cumsum()
df['per_bday'] = np.where( df.per_bday > 30, 30, df.per_bday )

df['granny_pays'] = df.groupby('BirthDayAge').per_bday.diff()
df['granny_pays'] = df.granny_pays.fillna(df.per_bday)

请注意,per_gift"忽略了 30 美元的最高补贴,而per_bday"是每个BirthDayAge"的累计补贴(上限为 30 美元).

Note that 'per_gift' ignores the maximum subsidy of $30 and 'per_bday' is the cumulative subsidy (capped at $30) per 'BirthDayAge'.

  BirthDayAge  DollarsSpent  PresentNum  per_gift  per_bday  granny_pays
0          11            25           1         5         5            5
1          11           100           2        80        30           25
2          11            10           3         0        30            0
3          11            50           4        30        30            0
4          12            39           1        19        19           19
5          12             7           2         0        19            0
6          12            32           3        12        30           11
7          12            19           4         0        30            0
8          13            21           1         1         1            1
9          13            27           2         7         8            7

这篇关于Pandas Group By with Running Total的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆