Pandas 中的聚合 [英] Aggregation in Pandas
问题描述
- 如何使用 Pandas 进行聚合?
- 聚合后没有 DataFrame!发生了什么?
- 我如何主要聚合字符串列(到
list
s、tuple
s、strings with separator
)? - 如何汇总计数?
- 如何创建由聚合值填充的新列?
我已经看到这些反复出现的问题,这些问题询问了 Pandas 聚合功能的各个方面.今天关于聚合及其各种用例的大部分信息都分散在数十个措辞恶劣、无法搜索的帖子中.这里的目的是为后代整理一些更重要的观点.
这个问答是一系列有用的用户指南的下一部分:
请注意,这篇文章并不是要替代 关于聚合和关于groupby的文档,所以请阅读还有!
问题 1
如何使用 Pandas 进行聚合?
扩展了聚合文档.
聚合函数是减少返回对象维度的函数.这意味着输出的 Series/DataFrame 与原始的行数相同或更少.
下表列出了一些常见的聚合函数:
<前>功能 说明mean() 计算组的平均值sum() 计算组值的总和size() 计算组大小count() 计算组的计数std() 组的标准差var() 计算组的方差sem() 组均值的标准误差describe() 生成描述性统计信息first() 计算第一个组值last() 计算组值的最后一个nth() 取第 n 个值,如果 n 是一个列表,则取一个子集min() 计算组值的最小值max() 计算组值的最大值np.random.seed(123)df = pd.DataFrame({'A' : ['foo', 'foo', 'bar', 'foo', 'bar', 'foo'],'B' : ['一', '二', '三', '二', '二', '一'],'C' : np.random.randint(5, size=6),'D' : np.random.randint(5, size=6),'E' : np.random.randint(5, size=6)})打印 (df)A B C D E0 富一 2 3 01 富二 4 1 02 巴三 2 1 13 富二 1 0 34 条 2 3 1 45 富一 2 1 0
按过滤列聚合和 Cython 实现的函数:
df1 = df.groupby(['A', 'B'], as_index=False)['C'].sum()打印 (df1)乙丙0 巴三 21 条 2 32 foo 一 43 富二 5
一个聚合函数用于所有没有在groupby
函数中指定的列,这里是A, Bcode>列:
df2 = df.groupby(['A', 'B'], as_index=False).sum()打印 (df2)A B C D E0 小节三 2 1 11 条 2 3 1 42 foo 一 4 4 03 富二 5 1 3
你也可以在groupby
函数后只指定一些用于聚合的列:
df3 = df.groupby(['A', 'B'], as_index=False)['C','D'].sum()打印 (df3)A B C D0 巴三 2 11 条 2 3 12 foo 一 4 43 富二 5 1
使用函数 DataFrameGroupBy.agg
:
df1 = df.groupby(['A', 'B'], as_index=False)['C'].agg('sum')打印 (df1)乙丙0 巴三 21 条 2 32 foo 一 43 富二 5df2 = df.groupby(['A', 'B'], as_index=False).agg('sum')打印 (df2)A B C D E0 小节三 2 1 11 条 2 3 1 42 foo 一 4 4 03 富二 5 1 3
对于应用于一列的多个函数,请使用 tuple
列表 - 新列和聚合函数的名称:
df4 = (df.groupby(['A', 'B'])['C'].agg([('average','mean'),('total','sum')]).reset_index())打印 (df4)A B 平均总数0 巴三 2.0 21 巴 2 3.0 32 foo 一 2.0 43 富二 2.5 5
如果要传递多个函数可以通过tuple
s的list
:
df5 = (df.groupby(['A', 'B']).agg([('average','mean'),('total','sum')]))打印 (df5)电汇平均总平均总平均总甲乙第三条 2.0 2 1.0 1 1.0 1两个 3.0 3 1.0 1 4.0 4富一 2.0 4 2.0 4 0.0 0两个 2.5 5 0.5 1 1.5 3
然后在列中获取MultiIndex
:
print (df5.columns)MultiIndex(levels=[['C', 'D', 'E'], ['average', 'total']],标签=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])
为了转换为列,将 MultiIndex
展平,使用 map
和 join
:
df5.columns = df5.columns.map('_'.join)df5 = df5.reset_index()打印 (df5)A B C_average C_total D_average D_total E_average E_total0 巴三 2.0 2 1.0 1 1.0 11 巴 2 3.0 3 1.0 1 4.0 42 富一 2.0 4 2.0 4 0.0 03 富二 2.5 5 0.5 1 1.5 3
另一种解决方案是传递聚合函数列表,然后展平 MultiIndex
并为其他列名称使用 str.replace
:
df5 = df.groupby(['A', 'B']).agg(['mean','sum'])df5.columns = (df5.columns.map('_'.join).str.replace('sum','total').str.replace('mean','average'))df5 = df5.reset_index()打印 (df5)A B C_average C_total D_average D_total E_average E_total0 巴三 2.0 2 1.0 1 1.0 11 巴 2 3.0 3 1.0 1 4.0 42 富一 2.0 4 2.0 4 0.0 03 富二 2.5 5 0.5 1 1.5 3
如果想用聚合函数分别指定每一列,通过dictionary
:
df6 = (df.groupby(['A', 'B'], as_index=False).agg({'C':'sum','D':'mean'}).rename(columns={'C':'C_total', 'D':'D_average'}))打印 (df6)A B C_total D_average0 巴三 2 1.01 巴 2 3 1.02 foo 一 4 2.03 富二 5 0.5
您也可以传递自定义函数:
def func(x):返回 x.iat[0] + x.iat[-1]df7 = (df.groupby(['A', 'B'], as_index=False).agg({'C':'sum','D': func}).rename(columns={'C':'C_total', 'D':'D_sum_first_and_last'}))打印 (df7)A B C_total D_sum_first_and_last0 巴三 2 21 条 2 3 22 foo 一 4 43 富二 5 1
问题 2
聚合后没有 DataFrame!发生了什么?
按两列或多列聚合:
df1 = df.groupby(['A', 'B'])['C'].sum()打印 (df1)甲乙酒吧三 2两个 3富一 4两个 5名称:C,数据类型:int32
首先检查Pandas对象的Index
和type
:
print (df1.index)MultiIndex(levels=[['bar', 'foo'], ['one', 'three', 'two']],标签=[[0, 0, 1, 1], [1, 2, 0, 2]],名称=['A', 'B'])打印(类型(df1))<class 'pandas.core.series.Series'>
如何让MultiIndex Series
列到列有两种解决方案:
- 添加参数
as_index=False
df1 = df.groupby(['A', 'B'], as_index=False)['C'].sum()打印 (df1)乙丙0 巴三 21 条 2 32 foo 一 43 富二 5
df1 = df.groupby(['A', 'B'])['C'].sum().reset_index()打印 (df1)乙丙0 巴三 21 条 2 32 foo 一 43 富二 5
如果按一列分组:
df2 = df.groupby('A')['C'].sum()打印 (df2)一种第 5 小节富 9名称:C,数据类型:int32
... 使用 Index
获取 Series
:
print (df2.index)Index(['bar', 'foo'], dtype='object', name='A')打印(类型(df2))<class 'pandas.core.series.Series'>
解决方案和MultiIndex Series
中的一样:
df2 = df.groupby('A', as_index=False)['C'].sum()打印 (df2)交流电0 巴 51 富 9df2 = df.groupby('A')['C'].sum().reset_index()打印 (df2)交流电0 巴 51 富 9
问题 3
如何聚合主要字符串列(到list
s、元组
s、带分隔符的字符串
)?
df = pd.DataFrame({'A' : ['a', 'c', 'b', 'b', 'a', 'c', 'b'],'B' : ['一', '二', '三', '二', '二', '一', '三'],'C' : ['三', '一', '二', '二', '三', '二', '一'],'D' : [1,2,3,2,3,1,2]})打印 (df)A B C D0 一 三 11 C 二一 22 b 三两 33 b 二二 24 一二三 35 c 一二 16 b 三一 2
可以通过list
、tuple
、set
代替聚合函数来转换列:
df1 = df.groupby('A')['B'].agg(list).reset_index()打印 (df1)甲乙0 a [一,二]1 b [三,二,三]2 c [二,一]
另一种方法是使用 GroupBy.apply
:
df1 = df.groupby('A')['B'].apply(list).reset_index()打印 (df1)甲乙0 a [一,二]1 b [三,二,三]2 c [二,一]
要转换为带分隔符的字符串,仅当它是字符串列时才使用 .join
:
df2 = df.groupby('A')['B'].agg(','.join).reset_index()打印 (df2)甲乙0 一、二1 b 三、二、三2 C 两个,一个
如果是数字列,使用带有 astype
用于转换为 string
s:
df3 = (df.groupby('A')['D'].agg(lambda x: ','.join(x.astype(str))).reset_index())打印 (df3)广告0 1,31 b 3,2,22 c 2,1
另一种解决方案是在 groupby
之前转换为字符串:
df3 = (df.assign(D = df['D'].astype(str)).groupby('A')['D'].agg(','.join).reset_index())打印 (df3)广告0 1,31 b 3,2,22 c 2,1
要转换所有列,请不要在 groupby
之后传递列列表.没有任何列 D
,因为 自动排除麻烦"列.这意味着排除所有数字列.
df4 = df.groupby('A').agg(','.join).reset_index()打印 (df4)乙丙0 一、二、三、三1 b 三、二、三、二、二、一2 c 二,一,二
所以需要把所有的列都转成字符串,然后得到所有的列:
df5 = (df.groupby('A').agg(lambda x: ','.join(x.astype(str))).reset_index())打印 (df5)A B C D0 一、二、三、三 1,31 b 三、二、三、二、二、一 3、2、22 c 二、一、二 2,1
问题 4
如何汇总计数?
df = pd.DataFrame({'A' : ['a', 'c', 'b', 'b', 'a', 'c', 'b'],'B' : ['一', '二', '三', '二', '二', '一', '三'],'C' : ['三', np.nan, np.nan, '二', '三', '二', '一'],'D' : [np.nan,2,3,2,3,np.nan,2]})打印 (df)A B C D0 一三 NaN1 c 两个 NaN 2.02 b 三 NaN 3.03 b 二二 2.04 一二三 3.05 c 一二 NaN6 b 三一 2.0
df1 = df.groupby('A').size().reset_index(name='COUNT')打印 (df1)一个 COUNT0 一 21 到 32 c 2
函数GroupBy.count
排除缺失值:
df2 = df.groupby('A')['C'].count().reset_index(name='COUNT')打印 (df2)一个 COUNT0 一 21 到 22 c 1
这个函数应该用于多列计算非缺失值:
df3 = df.groupby('A').count().add_suffix('_COUNT').reset_index()打印 (df3)A B_COUNT C_COUNT D_COUNT0 2 2 11 b 3 2 32 2 1 1
一个相关的函数是Series.value_counts
.它以降序返回包含唯一值计数的对象的大小,因此第一个元素是最常出现的元素.它默认排除 NaN
的值.
df4 = (df['A'].value_counts().rename_axis('A').reset_index(name='COUNT'))打印 (df4)一个 COUNT0 b 31 一个 22 c 2
如果你想要像使用函数 groupby
+ size
一样的输出,添加 Series.sort_index
:
df5 = (df['A'].value_counts().sort_index().rename_axis('A').reset_index(name='COUNT'))打印 (df5)一个 COUNT0 一 21 到 32 c 2
问题 5
如何创建由聚合值填充的新列?
方法 GroupBy.transform
返回一个对象,该对象与被分组的对象索引相同(相同大小).
有关详细信息,请参阅 Pandas 文档.>
np.random.seed(123)df = pd.DataFrame({'A' : ['foo', 'foo', 'bar', 'foo', 'bar', 'foo'],'B' : ['一', '二', '三', '二', '二', '一'],'C' : np.random.randint(5, size=6),'D' : np.random.randint(5, size=6)})打印 (df)A B C D0 富一 2 31 富二 4 12 条三 2 13 富二 1 04 条 2 3 15 富一 2 1df['C1'] = df.groupby('A')['C'].transform('sum')df['C2'] = df.groupby(['A','B'])['C'].transform('sum')df[['C3','D3']] = df.groupby('A')['C','D'].transform('sum')df[['C4','D4']] = df.groupby(['A','B'])['C','D'].transform('sum')打印 (df)A B C D C1 C2 C3 D3 C4 D40 foo 一 2 3 9 4 9 5 4 41 富二 4 1 9 5 9 5 5 12 巴三 2 1 5 2 5 2 2 13 富二 1 0 9 5 9 5 5 14 巴 2 3 1 5 3 5 2 3 15 foo 一 2 1 9 4 9 5 4 4
- How can I perform aggregation with Pandas?
- No DataFrame after aggregation! What happened?
- How can I aggregate mainly strings columns (to
list
s,tuple
s,strings with separator
)? - How can I aggregate counts?
- How can I create a new column filled by aggregated values?
I've seen these recurring questions asking about various faces of the pandas aggregate functionality. Most of the information regarding aggregation and its various use cases today is fragmented across dozens of badly worded, unsearchable posts. The aim here is to collate some of the more important points for posterity.
This Q&A is meant to be the next instalment in a series of helpful user-guides:
- How to pivot a dataframe,
- Pandas concat
- How do I operate on a DataFrame with a Series for every column?
- Pandas Merging 101
Please note that this post is not meant to be a replacement for the documentation about aggregation and about groupby, so please read that as well!
Question 1
How can I perform aggregation with Pandas?
Expanded aggregation documentation.
Aggregating functions are the ones that reduce the dimension of the returned objects. It means output Series/DataFrame have less or same rows like original.
Some common aggregating functions are tabulated below:
Function Description mean() Compute mean of groups sum() Compute sum of group values size() Compute group sizes count() Compute count of group std() Standard deviation of groups var() Compute variance of groups sem() Standard error of the mean of groups describe() Generates descriptive statistics first() Compute first of group values last() Compute last of group values nth() Take nth value, or a subset if n is a list min() Compute min of group values max() Compute max of group values
np.random.seed(123)
df = pd.DataFrame({'A' : ['foo', 'foo', 'bar', 'foo', 'bar', 'foo'],
'B' : ['one', 'two', 'three','two', 'two', 'one'],
'C' : np.random.randint(5, size=6),
'D' : np.random.randint(5, size=6),
'E' : np.random.randint(5, size=6)})
print (df)
A B C D E
0 foo one 2 3 0
1 foo two 4 1 0
2 bar three 2 1 1
3 foo two 1 0 3
4 bar two 3 1 4
5 foo one 2 1 0
Aggregation by filtered columns and Cython implemented functions:
df1 = df.groupby(['A', 'B'], as_index=False)['C'].sum()
print (df1)
A B C
0 bar three 2
1 bar two 3
2 foo one 4
3 foo two 5
An aggregate function is used for all columns without being specified in the groupby
function, here the A, B
columns:
df2 = df.groupby(['A', 'B'], as_index=False).sum()
print (df2)
A B C D E
0 bar three 2 1 1
1 bar two 3 1 4
2 foo one 4 4 0
3 foo two 5 1 3
You can also specify only some columns used for aggregation in a list after the groupby
function:
df3 = df.groupby(['A', 'B'], as_index=False)['C','D'].sum()
print (df3)
A B C D
0 bar three 2 1
1 bar two 3 1
2 foo one 4 4
3 foo two 5 1
Same results by using function DataFrameGroupBy.agg
:
df1 = df.groupby(['A', 'B'], as_index=False)['C'].agg('sum')
print (df1)
A B C
0 bar three 2
1 bar two 3
2 foo one 4
3 foo two 5
df2 = df.groupby(['A', 'B'], as_index=False).agg('sum')
print (df2)
A B C D E
0 bar three 2 1 1
1 bar two 3 1 4
2 foo one 4 4 0
3 foo two 5 1 3
For multiple functions applied for one column use a list of tuple
s - names of new columns and aggregated functions:
df4 = (df.groupby(['A', 'B'])['C']
.agg([('average','mean'),('total','sum')])
.reset_index())
print (df4)
A B average total
0 bar three 2.0 2
1 bar two 3.0 3
2 foo one 2.0 4
3 foo two 2.5 5
If want to pass multiple functions is possible pass list
of tuple
s:
df5 = (df.groupby(['A', 'B'])
.agg([('average','mean'),('total','sum')]))
print (df5)
C D E
average total average total average total
A B
bar three 2.0 2 1.0 1 1.0 1
two 3.0 3 1.0 1 4.0 4
foo one 2.0 4 2.0 4 0.0 0
two 2.5 5 0.5 1 1.5 3
Then get MultiIndex
in columns:
print (df5.columns)
MultiIndex(levels=[['C', 'D', 'E'], ['average', 'total']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]])
And for converting to columns, flattening MultiIndex
use map
with join
:
df5.columns = df5.columns.map('_'.join)
df5 = df5.reset_index()
print (df5)
A B C_average C_total D_average D_total E_average E_total
0 bar three 2.0 2 1.0 1 1.0 1
1 bar two 3.0 3 1.0 1 4.0 4
2 foo one 2.0 4 2.0 4 0.0 0
3 foo two 2.5 5 0.5 1 1.5 3
Another solution is pass list of aggregate functions, then flatten MultiIndex
and for another columns names use str.replace
:
df5 = df.groupby(['A', 'B']).agg(['mean','sum'])
df5.columns = (df5.columns.map('_'.join)
.str.replace('sum','total')
.str.replace('mean','average'))
df5 = df5.reset_index()
print (df5)
A B C_average C_total D_average D_total E_average E_total
0 bar three 2.0 2 1.0 1 1.0 1
1 bar two 3.0 3 1.0 1 4.0 4
2 foo one 2.0 4 2.0 4 0.0 0
3 foo two 2.5 5 0.5 1 1.5 3
If want specified each column with aggregated function separately pass dictionary
:
df6 = (df.groupby(['A', 'B'], as_index=False)
.agg({'C':'sum','D':'mean'})
.rename(columns={'C':'C_total', 'D':'D_average'}))
print (df6)
A B C_total D_average
0 bar three 2 1.0
1 bar two 3 1.0
2 foo one 4 2.0
3 foo two 5 0.5
You can pass custom function too:
def func(x):
return x.iat[0] + x.iat[-1]
df7 = (df.groupby(['A', 'B'], as_index=False)
.agg({'C':'sum','D': func})
.rename(columns={'C':'C_total', 'D':'D_sum_first_and_last'}))
print (df7)
A B C_total D_sum_first_and_last
0 bar three 2 2
1 bar two 3 2
2 foo one 4 4
3 foo two 5 1
Question 2
No DataFrame after aggregation! What happened?
Aggregation by two or more columns:
df1 = df.groupby(['A', 'B'])['C'].sum()
print (df1)
A B
bar three 2
two 3
foo one 4
two 5
Name: C, dtype: int32
First check the Index
and type
of a Pandas object:
print (df1.index)
MultiIndex(levels=[['bar', 'foo'], ['one', 'three', 'two']],
labels=[[0, 0, 1, 1], [1, 2, 0, 2]],
names=['A', 'B'])
print (type(df1))
<class 'pandas.core.series.Series'>
There are two solutions for how to get MultiIndex Series
to columns:
- add parameter
as_index=False
df1 = df.groupby(['A', 'B'], as_index=False)['C'].sum()
print (df1)
A B C
0 bar three 2
1 bar two 3
2 foo one 4
3 foo two 5
- use
Series.reset_index
:
df1 = df.groupby(['A', 'B'])['C'].sum().reset_index()
print (df1)
A B C
0 bar three 2
1 bar two 3
2 foo one 4
3 foo two 5
If group by one column:
df2 = df.groupby('A')['C'].sum()
print (df2)
A
bar 5
foo 9
Name: C, dtype: int32
... get Series
with Index
:
print (df2.index)
Index(['bar', 'foo'], dtype='object', name='A')
print (type(df2))
<class 'pandas.core.series.Series'>
And the solution is the same like in the MultiIndex Series
:
df2 = df.groupby('A', as_index=False)['C'].sum()
print (df2)
A C
0 bar 5
1 foo 9
df2 = df.groupby('A')['C'].sum().reset_index()
print (df2)
A C
0 bar 5
1 foo 9
Question 3
How can I aggregate mainly strings columns (to list
s, tuple
s, strings with separator
)?
df = pd.DataFrame({'A' : ['a', 'c', 'b', 'b', 'a', 'c', 'b'],
'B' : ['one', 'two', 'three','two', 'two', 'one', 'three'],
'C' : ['three', 'one', 'two', 'two', 'three','two', 'one'],
'D' : [1,2,3,2,3,1,2]})
print (df)
A B C D
0 a one three 1
1 c two one 2
2 b three two 3
3 b two two 2
4 a two three 3
5 c one two 1
6 b three one 2
Instead of an aggregation function, it is possible to pass list
, tuple
, set
for converting the column:
df1 = df.groupby('A')['B'].agg(list).reset_index()
print (df1)
A B
0 a [one, two]
1 b [three, two, three]
2 c [two, one]
An alternative is use GroupBy.apply
:
df1 = df.groupby('A')['B'].apply(list).reset_index()
print (df1)
A B
0 a [one, two]
1 b [three, two, three]
2 c [two, one]
For converting to strings with a separator, use .join
only if it is a string column:
df2 = df.groupby('A')['B'].agg(','.join).reset_index()
print (df2)
A B
0 a one,two
1 b three,two,three
2 c two,one
If it is a numeric column, use a lambda function with astype
for converting to string
s:
df3 = (df.groupby('A')['D']
.agg(lambda x: ','.join(x.astype(str)))
.reset_index())
print (df3)
A D
0 a 1,3
1 b 3,2,2
2 c 2,1
Another solution is converting to strings before groupby
:
df3 = (df.assign(D = df['D'].astype(str))
.groupby('A')['D']
.agg(','.join).reset_index())
print (df3)
A D
0 a 1,3
1 b 3,2,2
2 c 2,1
For converting all columns, don't pass a list of column(s) after groupby
.
There isn't any column D
, because automatic exclusion of 'nuisance' columns. It means all numeric columns are excluded.
df4 = df.groupby('A').agg(','.join).reset_index()
print (df4)
A B C
0 a one,two three,three
1 b three,two,three two,two,one
2 c two,one one,two
So it's necessary to convert all columns into strings, and then get all columns:
df5 = (df.groupby('A')
.agg(lambda x: ','.join(x.astype(str)))
.reset_index())
print (df5)
A B C D
0 a one,two three,three 1,3
1 b three,two,three two,two,one 3,2,2
2 c two,one one,two 2,1
Question 4
How can I aggregate counts?
df = pd.DataFrame({'A' : ['a', 'c', 'b', 'b', 'a', 'c', 'b'],
'B' : ['one', 'two', 'three','two', 'two', 'one', 'three'],
'C' : ['three', np.nan, np.nan, 'two', 'three','two', 'one'],
'D' : [np.nan,2,3,2,3,np.nan,2]})
print (df)
A B C D
0 a one three NaN
1 c two NaN 2.0
2 b three NaN 3.0
3 b two two 2.0
4 a two three 3.0
5 c one two NaN
6 b three one 2.0
Function GroupBy.size
for size
of each group:
df1 = df.groupby('A').size().reset_index(name='COUNT')
print (df1)
A COUNT
0 a 2
1 b 3
2 c 2
Function GroupBy.count
excludes missing values:
df2 = df.groupby('A')['C'].count().reset_index(name='COUNT')
print (df2)
A COUNT
0 a 2
1 b 2
2 c 1
This function should be used for multiple columns for counting non-missing values:
df3 = df.groupby('A').count().add_suffix('_COUNT').reset_index()
print (df3)
A B_COUNT C_COUNT D_COUNT
0 a 2 2 1
1 b 3 2 3
2 c 2 1 1
A related function is Series.value_counts
. It returns the size of the object containing counts of unique values in descending order, so that the first element is the most frequently-occurring element. It excludes NaN
s values by default.
df4 = (df['A'].value_counts()
.rename_axis('A')
.reset_index(name='COUNT'))
print (df4)
A COUNT
0 b 3
1 a 2
2 c 2
If you want same output like using function groupby
+ size
, add Series.sort_index
:
df5 = (df['A'].value_counts()
.sort_index()
.rename_axis('A')
.reset_index(name='COUNT'))
print (df5)
A COUNT
0 a 2
1 b 3
2 c 2
Question 5
How can I create a new column filled by aggregated values?
Method GroupBy.transform
returns an object that is indexed the same (same size) as the one being grouped.
See the Pandas documentation for more information.
np.random.seed(123)
df = pd.DataFrame({'A' : ['foo', 'foo', 'bar', 'foo', 'bar', 'foo'],
'B' : ['one', 'two', 'three','two', 'two', 'one'],
'C' : np.random.randint(5, size=6),
'D' : np.random.randint(5, size=6)})
print (df)
A B C D
0 foo one 2 3
1 foo two 4 1
2 bar three 2 1
3 foo two 1 0
4 bar two 3 1
5 foo one 2 1
df['C1'] = df.groupby('A')['C'].transform('sum')
df['C2'] = df.groupby(['A','B'])['C'].transform('sum')
df[['C3','D3']] = df.groupby('A')['C','D'].transform('sum')
df[['C4','D4']] = df.groupby(['A','B'])['C','D'].transform('sum')
print (df)
A B C D C1 C2 C3 D3 C4 D4
0 foo one 2 3 9 4 9 5 4 4
1 foo two 4 1 9 5 9 5 5 1
2 bar three 2 1 5 2 5 2 2 1
3 foo two 1 0 9 5 9 5 5 1
4 bar two 3 1 5 3 5 2 3 1
5 foo one 2 1 9 4 9 5 4 4
这篇关于Pandas 中的聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!