pandas 分组比分为多列 [英] Pandas groupby result into multiple columns

查看:67
本文介绍了 pandas 分组比分为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框,希望在其中进行分组,然后将组中的值划分为多列.

I have a dataframe in which I'm looking to group and then partition the values within a group into multiple columns.

例如:说我有以下数据框:

For example: say I have the following dataframe:

>>> import pandas as pd
>>> import numpy as np
>>> df=pd.DataFrame()
>>> df['Group']=['A','C','B','A','C','C']
>>> df['ID']=[1,2,3,4,5,6]
>>> df['Value']=np.random.randint(1,100,6)
>>> df
  Group  ID  Value
0     A   1     66
1     C   2      2
2     B   3     98
3     A   4     90
4     C   5     85
5     C   6     38
>>> 

我想对"Group"字段进行分组,获取"Value"字段的总和,并获取新字段,每个字段都包含该组的ID值.

I want to groupby the "Group" field, get the sum of the "Value" field, and get new fields, each of which holds the ID values of the group.

目前,我可以按照以下步骤进行操作,但是我正在寻找一种更清洁的方法:

Currently I am able to do this as follows, but I am looking for a cleaner methodology:

首先,我创建一个数据框,其中包含每个组中的ID列表.

First, I create a dataframe with a list of the IDs in each group.

>>> g=df.groupby('Group')
>>> result=g.agg({'Value':np.sum, 'ID':lambda x:x.tolist()})
>>> result
              ID  Value
Group                  
A         [1, 4]     98
B            [3]     76
C      [2, 5, 6]    204
>>> 

然后我使用pd.Series将它们分成几列,重命名它们,然后将其重新加入.

And then I use pd.Series to split those up into columns, rename them, and then join it back.

>>> id_df=result.ID.apply(lambda x:pd.Series(x))
>>> id_cols=['ID'+str(x) for x in range(1,len(id_df.columns)+1)]
>>> id_df.columns=id_cols
>>> 
>>> result.join(id_df)[id_cols+['Value']]
       ID1  ID2  ID3  Value
Group                      
A        1    4  NaN     98
B        3  NaN  NaN     76
C        2    5    6    204
>>> 

有没有一种方法而不必先创建值列表?

Is there a way to do this without first having to create the list of values?

推荐答案

您可以使用

id_df = grouped['ID'].apply(lambda x: pd.Series(x.values)).unstack()

在没有中间result DataFrame的情况下创建id_df.

to create id_df without the intermediate result DataFrame.

import pandas as pd
import numpy as np
np.random.seed(2016)

df = pd.DataFrame({'Group': ['A', 'C', 'B', 'A', 'C', 'C'],
                   'ID': [1, 2, 3, 4, 5, 6],
                   'Value': np.random.randint(1, 100, 6)})

grouped = df.groupby('Group')
values = grouped['Value'].agg('sum')
id_df = grouped['ID'].apply(lambda x: pd.Series(x.values)).unstack()
id_df = id_df.rename(columns={i: 'ID{}'.format(i + 1) for i in range(id_df.shape[1])})
result = pd.concat([id_df, values], axis=1)
print(result)

收益

       ID1  ID2  ID3  Value
Group                      
A        1    4  NaN     77
B        3  NaN  NaN     84
C        2    5    6     86

这篇关于 pandas 分组比分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆