GroupBy 和聚合集合 [英] GroupBy and aggregate with set intersection
本文介绍了GroupBy 和聚合集合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个带有集合列的 Pandas DataFrame:
I have a pandas DataFrame with a sets column:
import pandas as pd
df = pd.DataFrame({'group_var': [1,1,2,2], 'sets_var': [set([0, 1]), set([1, 2]), set([3, 4]), set([5, 6, 7])]})
df
group_var sets_var
0 1 {0, 1}
1 1 {1, 2}
2 2 {3, 4}
3 2 {5, 6, 7}
我希望对 group_var
进行 groupby
并得到所有对应的 sets_var
集合的交集,如下所示:
I wish to groupby
the group_var
and get the intersection of all corresponding sets of sets_var
, like so:
group_var sets_var
0 1 {1}
1 2 {}
或像这样的系列:
sets_var
1 {1}
2 {}
我将如何优雅地完成它?性能是重中之重.
How would I go about it in elegance? Performance is top priority.
推荐答案
使用groupby
、agg
,并使用set.intersection
减少.
df.groupby('group_var', as_index=False).agg(lambda x: set.intersection(*x))
group_var sets_var
0 1 {1}
1 2 {}
如果性能绝对重要,我们可以尝试去掉lambda
:
If performance is absolutely important, we can try getting rid of the lambda
:
from functools import partial, reduce
import operator
p = partial(reduce, operator.and_)
df.groupby('group_var', as_index=False).agg(p)
group_var sets_var
0 1 {1}
1 2 {}
但是,这仅执行成对交叉,因此您的里程可能会有所不同.
However, this only performs a pairwise intersection, so your mileage may vary.
或者,作为一个系列,
pd.Series({
k: set.intersection(*g.tolist())
for k, g in df.groupby('group_var')['sets_var']})
1 {1}
2 {}
dtype: object
这篇关于GroupBy 和聚合集合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文