Pandas groupby 自定义组 [英] Pandas groupby custom groups
问题描述
假设我有一个这样的数据框:
Let's say I have a dataframe like this:
df = pd.DataFrame({'A': [1, 2, 3, 4, 5, 6], 'B': ['a', 'a', 'b', 'b', 'c', 'c']})
print(df)
A B
0 1 a
1 2 a
2 3 b
3 4 b
4 5 c
5 6 c
我如何按 col B
分组,使得组为 a
、a OR b
和 a OR b OR c
,而不仅仅是 a
、b
和 c
?为了这个例子,假设我想按 'sum'
聚合结果.然后我会得到:
How can I group by col B
such that the groups are a
, a OR b
and a OR b OR c
, rather than just a
, b
and c
? For the sake of the example, let's say that I want to aggregate the results by 'sum'
. I would then end up with:
A
a 3
a OR b 10
a OR b OR c 21
推荐答案
我认为这真的取决于你想要使用的功能.我可以想到一个技巧 DataFrame.expanding
例如,如果你想计算sum.这个想法是我们可以利用扩展,然后只考虑整个组的行已通过 Series.where 选择
I think it really depends on the function you want to use.
I can think of a trick with DataFrame.expanding
for example if you want to calculate the sum.The idea is that we can take advantage of the expansion and then only take into account the rows where entire groups have been selected with Series.where
df.expanding().sum().where(df['B'].ne(df['B'].shift(-1)))
A
0 NaN
1 3.0
2 NaN
3 10.0
4 NaN
5 21.0
<小时>
df.expanding().sum().where(df['B'].ne(df['B'].shift(-1))).loc[lambda x: x.A.notna()]
A
1 3.0
3 10.0
5 21.0
更新
我们也可以使用DataFrame.groupby
+ DataFrame.expanding
We can also use DataFrame.groupby
+ DataFrame.expanding
df.groupby('B').sum().expanding().sum()
要获得预期的输出:
new_df = (df.groupby('B').sum().expanding().sum()
.reset_index()
.assign(B = lambda x: x.B.add(' or ').cumsum()
.str.rstrip(' or '))
.set_index('B') )
print(new_df)
A
B
a 3.0
a or b 10.0
a or b or c 21.0
这篇关于Pandas groupby 自定义组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!