pandas 与groupby占总数的百分比 [英] Pandas percentage of total with groupby

查看:83
本文介绍了 pandas 与groupby占总数的百分比的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这显然很简单,但是作为一个笨拙的新手,我被卡住了.

This is obviously simple, but as a numpy newbe I'm getting stuck.

我有一个CSV文件,其中包含3列,分别是该办公室的州,办公室ID和销售.

I have a CSV file that contains 3 columns, the State, the Office ID, and the Sales for that office.

我想计算给定状态下每个办公室的销售额百分比(每个州的所有百分比总计为100%).

I want to calculate the percentage of sales per office in a given state (total of all percentages in each state is 100%).

df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': range(1, 7) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)]})

df.groupby(['state', 'office_id']).agg({'sales': 'sum'})

这将返回:

                  sales
state office_id        
AZ    2          839507
      4          373917
      6          347225
CA    1          798585
      3          890850
      5          454423
CO    1          819975
      3          202969
      5          614011
WA    2          163942
      4          369858
      6          959285

我似乎无法弄清楚如何达到" groupbystate级别以总计整个statesales来计算分数.

I can't seem to figure out how to "reach up" to the state level of the groupby to total up the sales for the entire state to calculate the fraction.

推荐答案

Paul H的答案是正确的,您将拥有来创建第二个groupby对象,但是您可以以更简单的方式计算百分比-只需groupby state_office并将sales列除以其和即可.复制Paul H答案的开头:

Paul H's answer is right that you will have to make a second groupby object, but you can calculate the percentage in a simpler way -- just groupby the state_office and divide the sales column by its sum. Copying the beginning of Paul H's answer:

# From Paul H
import numpy as np
import pandas as pd
np.random.seed(0)
df = pd.DataFrame({'state': ['CA', 'WA', 'CO', 'AZ'] * 3,
                   'office_id': list(range(1, 7)) * 2,
                   'sales': [np.random.randint(100000, 999999)
                             for _ in range(12)]})
state_office = df.groupby(['state', 'office_id']).agg({'sales': 'sum'})
# Change: groupby state_office and divide by sum
state_pcts = state_office.groupby(level=0).apply(lambda x:
                                                 100 * x / float(x.sum()))

返回:

                     sales
state office_id           
AZ    2          16.981365
      4          19.250033
      6          63.768601
CA    1          19.331879
      3          33.858747
      5          46.809373
CO    1          36.851857
      3          19.874290
      5          43.273852
WA    2          34.707233
      4          35.511259
      6          29.781508

这篇关于 pandas 与groupby占总数的百分比的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆