将小计添加到 Pandas Groupby [英] Adding Subtotals to Pandas Groupby

查看:71
本文介绍了将小计添加到 Pandas Groupby的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在寻找一种更简洁的方法来向 Pandas groupby 添加小计.

这是我的数据帧:

df = pd.DataFrame({'类别':np.random.choice(['A组','B组'], 50),'子类别':np.random.choice(['X','Y'], 50),'产品':np.random.choice(['产品 1','产品 2'], 50),'Units_Sold':np.random.randint(1,100, size=(50)),'Dollars_Sold':np.random.randint(100,1000, size=50),'日期':np.random.choice( pd.date_range('1/1/2011','03/31/2011',freq='D'), 50, 替换=假)})

从那里,我创建了一个新的 Groupby 数据框:

df1 = df.groupby(['Category','Sub-Category','Product',pd.TimeGrouper(key='Date',freq='M')]).agg({'Units_Sold':'sum','Dollars_Sold':'sum'}).unstack().fillna(0)

我想同时提供类别和类别的小计;子类别.我可以使用此代码执行此操作:

df2 = df1.groupby(level=[0,1]).sum()df2.index = pd.MultiIndex.from_arrays([df2.index.get_level_values(0),df2.index.get_level_values(1) + '总计',len(df2) * ['']])df3 = df1.groupby(level=[0]).sum()df3.index = pd.MultiIndex.from_arrays([df3.index.get_level_values(0) + 'Total',len(df3) * [''],len(df3) * ['']])pd.concat([df1,df2,df3]).sort_index()

这给了我想要的数据帧:

I am looking for a cleaner way to add subtotals to Pandas groupby.

Here is my DataFrame:

df = pd.DataFrame({
'Category':np.random.choice( ['Group A','Group B'], 50),
'Sub-Category':np.random.choice( ['X','Y'], 50),
'Product':np.random.choice( ['Product 1','Product 2'], 50),
'Units_Sold':np.random.randint(1,100, size=(50)),
'Dollars_Sold':np.random.randint(100,1000, size=50),
'Date':np.random.choice( pd.date_range('1/1/2011','03/31/2011',  
                      freq='D'), 50, replace=False)})

From there, I create a new Groupby Dataframe as such:

df1 = df.groupby(['Category','Sub-Category','Product',pd.TimeGrouper(key='Date',freq='M')]).agg({'Units_Sold':'sum','Dollars_Sold':'sum'}).unstack().fillna(0)

I would like to provide sub-totals for both Category & Sub-Category. I can do this using this code:

df2 = df1.groupby(level=[0,1]).sum()
df2.index = pd.MultiIndex.from_arrays([df2.index.get_level_values(0),
                                   df2.index.get_level_values(1) + ' Total',
                                   len(df2) * ['']])
df3 = df1.groupby(level=[0]).sum()
df3.index = pd.MultiIndex.from_arrays([df3.index.get_level_values(0) + ' Total',
                                   len(df3) * [''],
                                   len(df3) * ['']])
pd.concat([df1,df2,df3]).sort_index()

This gives me the DataFrame I want: Final DataFrame

My question - is there a more pythonic way to do this than to have to create a new DataFrame for each level then concat together? I have researched this, but can not find a better way. I have to do this for many different MultiIndex dataframes & am seeking a better solution.

Thanks in advance for your help!

EDIT WITH ADDITIONAL INFORMATION:

Thank you to both @Wen & @DaFanat for their replies. I attempted to use the link @Wen provided on my data [link]:Python (Pandas) Add subtotal on each lvl of multiindex dataframe

pd.concat([df.assign(\
    **{x: 'Total' for x in "CategorySub-CategoryProduct"[i:]}\
    ).groupby(list('abc')).sum() for i in range(1,4)])\
    .sort_index()

This sums the total, however it ignores the dates that make up the second level of the columns. It leaves me with this outcome.Resulting Image

I've tried to add in a TimeGrouper with the groupby, but that returns an error. Any help would be greatly appreciated. Thanks!

解决方案

I can get you a lot closer by aligning your attempt above with the example from @piRSquared.

The list must match the MultiIndex. Try this instead:

iList = ['Category','Sub-Category','Product']
pd.concat([
    df1.assign(
        **{x: '' for x in iList[i:]}
    ).groupby(iList).sum() for i in range(1,4)
]).sort_index()

It doesn't present the word "Total" in the right place, nor are the totals at the bottom of each group, but at least it's more-or-less functionally correct. My totals won't match because the values in the DataFrame are random.

It took me a while to work through the original answer provided in Python (Pandas) Add subtotal on each lvl of multiindex dataframe. But the same logic applies here.

The assign() replaces the values in the columns with what is in the dict that is returned by the dict comprehension executed over the elements of the list of MultiIndex columns.
Then groupby() only finds unique values for those non-blanked-out columns and sums them accordingly.
These groupbys are enclosed in a list comprehension, so pd.concat() then just combines these sets of rows.
And sort_index() puts the index labels in ascending order.
(Yes, you still get a warning about "both a column name and an index level," but it still works.)

这篇关于将小计添加到 Pandas Groupby的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆