pandas 在聚合列上合并 [英] Pandas merge on aggregated columns
问题描述
假设我创建一个DataFrame:
Let's say I create a DataFrame:
import pandas as pd
df = pd.DataFrame({"a": [1,2,3,13,15], "b": [4,5,6,6,6], "c": ["wish", "you","were", "here", "here"]})
像这样:
a b c
0 1 4 wish
1 2 5 you
2 3 6 were
3 13 6 here
4 15 6 here
...然后按几列进行分组和汇总...
... and then group and aggregate by a couple columns ...
gb = df.groupby(['b','c']).agg({"a": lambda x: x.nunique()})
产生以下结果:
a
b c
4 wish 1
5 you 1
6 here 2
were 1
是否可以将df
与新聚合的表gb
合并,以便在df中创建一个新列,其中包含来自gb
的相应值?像这样:
Is it possible to merge df
with the newly aggregated table gb
such that I create a new column in df, containing the corresponding values from gb
? Like this:
a b c nc
0 1 4 wish 1
1 2 5 you 1
2 3 6 were 1
3 13 6 here 2
4 15 6 here 2
我尝试做最简单的事情:
I tried doing the simplest thing:
df.merge(gb, on=['b','c'])
但这会导致错误:
KeyError: 'b'
之所以有意义,是因为分组表具有多索引并且b
不是列.所以我的问题有两个:
Which makes sense because the grouped table has a Multi-index and b
is not a column. So my question is two-fold:
- 是否可以将
gb
DataFrame的多索引转换回列(以使其具有b
和c
列)? - 我可以在列名称上将
df
与gb
合并吗?
- Can I transform the multi-index of the
gb
DataFrame back into columns (so that it has theb
andc
column)? - Can I merge
df
withgb
on the column names?
推荐答案
每当您要将groupby操作中的某些聚合列添加回df时,都应使用
Whenever you want to add some aggregated column from groupby operation back to the df you should be using transform
, this produces a Series with its index aligned with your orig df:
In [4]:
df['nc'] = df.groupby(['b','c'])['a'].transform(pd.Series.nunique)
df
Out[4]:
a b c nc
0 1 4 wish 1
1 2 5 you 1
2 3 6 were 1
3 13 6 here 2
4 15 6 here 2
无需重置索引或执行其他合并.
There is no need to reset the index or perform an additional merge.
这篇关于 pandas 在聚合列上合并的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!