为每行添加唯一组到DF,包括其他列的总和 [英] Add unique groups to DF for each row including sum from other columns

查看:36
本文介绍了为每行添加唯一组到DF,包括其他列的总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的DatFrame:

I got a DatFrame looking like this:

ID     field_1     area_1    field_2       area_2    field_3     area_3    field_4      area_4
1      scoccer     500       basketball    200       swimming    100       basketball   50
2      volleyball  100       np.nan        np.nan    np.nan      np.nan    np.nan       np.nan
3      basketball  1000      football      10        np.nan      np.nan    np.nan       np.nan
4      swimming    280       swimming      200       basketball  320       np.nan       np.nan
5      volleyball  110       football      160       volleyball  30        np.nan       np.nan 

原始DataFrame具有相同的结构,但包含的字段field_1到field_30以及area_1到area_30.

The original DataFrame has the same structure but containing columns field_1 up to field_30 as well as area_1 to area_30.

我想根据"field_x"中的不同表达式将具有水平组的列添加到DF中,并汇总相应的区域...添加的列应如下所示:

I'd like to add columns to the DF with horiztonal groups depending on the distinct expressions in 'field_x' with the correspondig area summed up... the added columns should look like this:

ID   group_1     area_1     group_2     area_2     group_3    area_3
        
1    scoccer     500        basketball  250        swimming   100
2    volleyball  100 
3    basketball  1000       football    10
4    swimming    480        basketball  320         
5    volleyball  140        football    160

有一种简单的方法可以实现吗?

Is there an easy way to realize this?

推荐答案

使用 pd.wide_to_long 调整DataFrame的形状,从而使您可以按字段和ID分组并汇总面积.在使用 cumcount 创建列标签之后,然后 pivot_table 返回宽格式.

Use pd.wide_to_long to reshape the DataFrame, which allows you to group by field and ID and sum the areas. Then pivot_table back to the wide format, after creating the column label with cumcount.

df = (pd.wide_to_long(df, i='ID', j='num', stubnames=['field', 'area'], sep='_')
        .groupby(['ID', 'field'])['area'].sum()
        .reset_index())
#   ID       field    area
#0   1  basketball   250.0
#1   1     scoccer   500.0
#2   1    swimming   100.0
#3   2  volleyball   100.0
#4   3  basketball  1000.0
#5   3    football    10.0
#6   4  basketball   320.0
#7   4    swimming   480.0
#8   5    football   160.0
#9   5  volleyball   140.0

df['idx'] = df.groupby('ID').cumcount()+1
df = (pd.pivot_table(df, index='ID', columns='idx', values=['field', 'area'], 
                     aggfunc='first')
        .sort_index(axis=1, level=1))
df.columns = ['_'.join(map(str, tup)) for tup in df.columns]


    area_1     field_1  area_2     field_2  area_3   field_3
ID                                                          
1    250.0  basketball   500.0     scoccer   100.0  swimming
2    100.0  volleyball     NaN         NaN     NaN       NaN
3   1000.0  basketball    10.0    football     NaN       NaN
4    320.0  basketball   480.0    swimming     NaN       NaN
5    160.0    football   140.0  volleyball     NaN       NaN


只是为了好玩,您可以使用未记录的 pd.lreshape 代替 wide_to_long .

# Change range to (1,31) for your real data.
pd.lreshape(df, {'area': [f'area_{i}' for i in range(1,5)],
                 'field': [f'field_{i}' for i in range(1,5)]}

#    ID    area       field
#0    1   500.0     scoccer
#1    2   100.0  volleyball
#2    3  1000.0  basketball
#3    4   280.0    swimming
#4    5   110.0  volleyball
#5    1   200.0  basketball
#....
#10   4   320.0  basketball
#11   5    30.0  volleyball
#12   1    50.0  basketball

这篇关于为每行添加唯一组到DF,包括其他列的总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆