Groupby Pandas数据框上的算术运算 [英] Arithmetic operation on a groupby pandas dataframe
本文介绍了Groupby Pandas数据框上的算术运算的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有40列和400000行的pandas数据框.我在3列上创建了一个汇总数据集.
I have a pandas dataframe with 40 columns and 400000 rows. I created a rolled up dataset on 3 columns.
现在,我需要根据其中两列来计算百分比指标. Python引发错误-
Now, I need to compute a % metric based on two of the columns. Python throws an error -
unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'
这是示例代码:
print sample_data
date part receipt bad_dollars total_dollars bad_percent
0 1 123 22 40 100 NaN
1 2 456 44 80 120 NaN
2 3 134 33 30 150 NaN
3 1 123 22 80 100 NaN
4 5 456 45 40 90 NaN
5 3 134 33 85 150 NaN
6 7 123 24 70 120 NaN
7 5 456 45 20 85 NaN
8 9 134 35 50 300 NaN
9 7 123 24 300 600 NaN
sample_data_group = sample_data.groupby(['date','part','receipt'])
sample_data_group['bad_percents']=sample_data_group['bad_dollars']/sample_data_group['total_dollars']
TypeError: unsupported operand type(s) for /: 'SeriesGroupBy' and 'SeriesGroupBy'
请帮助!
推荐答案
您可以通过对groupby对象应用apply来做到这一点:
You can do this using apply on the groupby object:
import pandas as pd
import numpy as np
cols = ['index', 'date', 'part', 'receipt', 'bad_dollars', 'total_dollars',
'bad_percent']
sample_data = pd.DataFrame([
[0, 1, 123, 22, 40, 100, np.nan],
[1, 2, 456, 44, 80, 120, np.nan],
[2, 3, 134, 33, 30, 150, np.nan],
[3, 1, 123, 22, 80, 100, np.nan],
[4, 5, 456, 45, 40, 90, np.nan],
[5, 3, 134, 33, 85, 150, np.nan],
[6, 7, 123, 24, 70, 120, np.nan],
[7, 5, 456, 45, 20, 85, np.nan],
[8, 9, 134, 35, 50, 300, np.nan],
[9, 7, 123, 24, 300, 600, np.nan]],
columns = cols).set_index('index', drop = True)
sample_data_group = sample_data.groupby(['date','part','receipt'])
xx = sample_data_group.apply(
lambda x: x.assign(bad_percent = x.bad_dollars/x.total_dollars))\
.reset_index(['date','part', 'receipt'], drop = True)
这篇关于Groupby Pandas数据框上的算术运算的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文