pandas groupby:我可以通过 MultiIndex 列的一级选择 agg 函数吗? [英] pandas groupby: can I select an agg function by one level of a column MultiIndex?
问题描述
我有一个带有 MultiIndex 列的 Pandas DataFrame:
columns=pd.MultiIndex.from_tuples([(c, i) for c in ['a', 'b'] for i in range(3)])df = pd.DataFrame(np.random.randn(4, 6),索引=[0, 0, 1, 1],列=列)打印(df)乙0 1 2 0 1 20 0.582804 0.753118 -0.900950 -0.914657 -0.333091 -0.9659120 0.498002 -0.842624 0.155783 0.559730 -0.300136 -1.2114121 0.727019 1.522160 1.679025 1.738350 0.593361 0.4119071 1.253759 -0.806279 -2.177582 -0.099210 -0.839822 -0.211349
我想按索引分组,并在 a
列上使用 'min' 聚合,在 b
列上使用 'sum' 聚合.>
我知道我可以通过创建一个为每列指定 agg 函数的 dict 来做到这一点:
agg_dict = {'a': 'min', 'b': 'sum'}full_agg_dict = {(c, i): agg_dict[c] for c in ['a', 'b'] for i in range(3)}打印(df.groupby(level=0).agg(full_agg_dict))乙0 1 2 0 1 20 0.498002 -0.842624 -0.900950 -0.354927 -0.633227 -2.1773241 0.727019 -0.806279 -2.177582 1.639140 -0.246461 0.200558
有没有更简单的方法?似乎应该有一种方法可以在不使用 full_agg_dict
的情况下使用 agg_dict
做到这一点.
我也会使用你的方法.但这是(应该)工作的另一种方式:
(df.stack(level=1).groupby(级别=[0,1]).agg({'a':'min','b':'sum'}).unstack(-1))
出于某种原因 groupby(level=[0,1]
对我不起作用,所以我想出了:
(df.stack(level=1).reset_index().groupby(['level_0','level_1']).agg({'a':'min','b':'sum'}).unstack('level_1'))
I have a pandas DataFrame with a MultiIndex of columns:
columns=pd.MultiIndex.from_tuples(
[(c, i) for c in ['a', 'b'] for i in range(3)])
df = pd.DataFrame(np.random.randn(4, 6),
index=[0, 0, 1, 1],
columns=columns)
print(df)
a b
0 1 2 0 1 2
0 0.582804 0.753118 -0.900950 -0.914657 -0.333091 -0.965912
0 0.498002 -0.842624 0.155783 0.559730 -0.300136 -1.211412
1 0.727019 1.522160 1.679025 1.738350 0.593361 0.411907
1 1.253759 -0.806279 -2.177582 -0.099210 -0.839822 -0.211349
I want to group by the index, and use the 'min' aggregation on the a
columns, and the 'sum' aggregation on the b
columns.
I know I can do this by creating a dict that specifies the agg function for each column:
agg_dict = {'a': 'min', 'b': 'sum'}
full_agg_dict = {(c, i): agg_dict[c] for c in ['a', 'b'] for i in range(3)}
print(df.groupby(level=0).agg(full_agg_dict))
a b
0 1 2 0 1 2
0 0.498002 -0.842624 -0.900950 -0.354927 -0.633227 -2.177324
1 0.727019 -0.806279 -2.177582 1.639140 -0.246461 0.200558
Is there a simpler way? It seems like there should be a way to do this with agg_dict
without using full_agg_dict
.
I would use your approach as well. But here's another way that (should) work:
(df.stack(level=1)
.groupby(level=[0,1])
.agg({'a':'min','b':'sum'})
.unstack(-1)
)
For some reason groupby(level=[0,1]
doesn't work for me, so I came up with:
(df.stack(level=1)
.reset_index()
.groupby(['level_0','level_1'])
.agg({'a':'min','b':'sum'})
.unstack('level_1')
)
这篇关于pandas groupby:我可以通过 MultiIndex 列的一级选择 agg 函数吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!