Groupby Pandas数据框上的算术运算 [英] Arithmetic operation on a groupby pandas dataframe

查看:93
本文介绍了Groupby Pandas数据框上的算术运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有40列和400000行的pandas数据框.我在3列上创建了一个汇总数据集.

I have a pandas dataframe with 40 columns and 400000 rows. I created a rolled up dataset on 3 columns.

现在,我需要根据其中两列来计算百分比指标. Python引发错误-

Now, I need to compute a % metric based on two of the columns. Python throws an error -

unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'

这是示例代码:

print sample_data
   date  part  receipt  bad_dollars  total_dollars  bad_percent
0     1   123       22           40            100          NaN
1     2   456       44           80            120          NaN
2     3   134       33           30            150          NaN
3     1   123       22           80            100          NaN
4     5   456       45           40             90          NaN
5     3   134       33           85            150          NaN
6     7   123       24           70            120          NaN
7     5   456       45           20             85          NaN
8     9   134       35           50            300          NaN
9     7   123       24          300            600          NaN

sample_data_group = sample_data.groupby(['date','part','receipt'])

sample_data_group['bad_percents']=sample_data_group['bad_dollars']/sample_data_group['total_dollars']

TypeError: unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'

请帮助!

推荐答案

您可以通过对groupby对象应用apply来做到这一点:

You can do this using apply on the groupby object:

import pandas as pd
import numpy as np

cols = ['index', 'date',  'part',  'receipt',  'bad_dollars',  'total_dollars',
        'bad_percent']
sample_data = pd.DataFrame([
[0,     1,   123,       22,           40,            100,          np.nan],
[1,     2,   456,       44,           80,            120,          np.nan],
[2,     3,   134,       33,           30,            150,          np.nan],
[3,     1,   123,       22,           80,            100,          np.nan],
[4,     5,   456,       45,           40,             90,          np.nan],
[5,     3,   134,       33,           85,            150,          np.nan],
[6,     7,   123,       24,           70,            120,          np.nan],
[7,     5,   456,       45,           20,             85,          np.nan],
[8,     9,   134,       35,           50,            300,          np.nan],
[9,     7,   123,       24,          300,            600,          np.nan]],
                           columns = cols).set_index('index', drop = True)

sample_data_group = sample_data.groupby(['date','part','receipt'])

xx = sample_data_group.apply(
         lambda x: x.assign(bad_percent = x.bad_dollars/x.total_dollars))\
                      .reset_index(['date','part', 'receipt'], drop = True)

这篇关于Groupby Pandas数据框上的算术运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆