pandas 将计算列添加到分组依据结果 [英] Pandas add calculated column to groupby result
本文介绍了 pandas 将计算列添加到分组依据结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
下面的python脚本计算以下内容.
The below python script computes the following.
- 每个客户的总收入报告
- 针对每个客户的报告,其中显示了他们在每个类别上的支出金额.
我想为每个报告计算营业税部分.
I want to compute the sales tax component for each of the reports.
(所有商品的营业税为9.25%.)
(All the items have a sales tax of 9.25%.)
import pandas as pd
from io import StringIO
mystr = """Pedro|groceries|apple|1.42
Nitin|tobacco|cigarettes|15.00
Susie|groceries|cereal|5.50
Susie|groceries|milk|4.75
Susie|tobacco|cigarettes|15.00
Susie|fuel|gasoline|44.90
Pedro|fuel|propane|9.60"""
df = pd.read_csv(StringIO(mystr), header=None, sep='|',
names=['Name', 'Category', 'Product', 'Sales'])
# Report 1
rep1 = df.groupby('Name')['Sales'].sum()
# Name
# Nitin 15.00
# Pedro 11.02
# Susie 70.15
# Name: Sales, dtype: float64
# Report 2
rep2 = df.groupby(['Name', 'Category'])['Sales'].sum()
# Name Category
# Nitin tobacco 15.00
# Pedro fuel 9.60
# groceries 1.42
# Susie fuel 44.90
# groceries 10.25
# tobacco 15.00
# Name: Sales, dtype: float64
推荐答案
这可以通过矢量化熊猫计算来实现:
This is possible via vectorised pandas calculations:
import pandas as pd
from io import StringIO
mystr = """Pedro|groceries|apple|1.42
Nitin|tobacco|cigarettes|15.00
Susie|groceries|cereal|5.50
Susie|groceries|milk|4.75
Susie|tobacco|cigarettes|15.00
Susie|fuel|gasoline|44.90
Pedro|fuel|propane|9.60"""
df = pd.read_csv(StringIO(mystr), header=None, sep='|',
names=['Name', 'Category', 'Product', 'Sales'])
# Report 1
rep1 = df.groupby('Name', as_index=False)['Sales'].sum()
rep1['Tax'] = rep1['Sales'] * 0.0925
# Name Sales Tax
# 0 Nitin 15.00 1.387500
# 1 Pedro 11.02 1.019350
# 2 Susie 70.15 6.488875
# Report 2
rep2 = df.groupby(['Name', 'Category'], as_index=False)['Sales'].sum()
rep2['Tax'] = rep2['Sales'] * 0.0925
# Name Category Sales Tax
# 0 Nitin tobacco 15.00 1.387500
# 1 Pedro fuel 9.60 0.888000
# 2 Pedro groceries 1.42 0.131350
# 3 Susie fuel 44.90 4.153250
# 4 Susie groceries 10.25 0.948125
# 5 Susie tobacco 15.00 1.387500
这篇关于 pandas 将计算列添加到分组依据结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文