使用Pandas GroupBy从多个列中聚合唯一值 [英] Aggregate unique values from multiple columns with pandas GroupBy
问题描述
I went into countless threads (1 2 3...) and still I don't find a solution to my problem... I have a dataframe like this:
prop1 prop2 prop3 prop4
L30 3 bob 11.2
L30 54 bob 10
L30 11 john 10
L30 10 bob 10
K20 12 travis 10
K20 1 travis 4
K20 66 leo 10
我想对prop1进行分组,并同时对所有其他列进行汇总,但只能使用唯一值.像这样:
I would like to do a groupby on prop1, AND at the same time, get all the other columns aggregated, but only with unique values. Like that:
prop1 prop2 prop3 prop4
L30 3,54,11,10 bob,john 11.2,10
K20 12,1,66 travis,leo 10,4
我尝试了不同的方法:
-
df.groupby('prop1')['prop2','prop3','prop4'].apply(np.unique)
返回
AttributeError:"numpy.ndarray"对象没有属性"index" 加上TypeError:Series.name必须是可哈希的类型
AttributeError: 'numpy.ndarray' object has no attribute 'index' PLUS TypeError: Series.name must be a hashable type
-
也:
.apply(lambda x: pd.unique(x.values.ravel()).tolist())
这给出了一个列表作为输出,我想要列.
Also:
.apply(lambda x: pd.unique(x.values.ravel()).tolist())
which gives a list as output, and I would like columns.
df.groupby('prop1')['prop2','prop3','prop4'].unique()
本身不起作用,因为有多个列.
df.groupby('prop1')['prop2','prop3','prop4'].unique()
by itself doesn't work because there are multiple columns.
.apply(f)
,f为:
def f(df):
df['prop2']=df['prop2'].drop_duplicates()
df['prop3']=df['prop3'].drop_duplicates()
df['prop4']=df['prop4'].drop_duplicates()
return df
def f(df):
df['prop2']=df['prop2'].drop_duplicates()
df['prop3']=df['prop3'].drop_duplicates()
df['prop4']=df['prop4'].drop_duplicates()
return df
什么都不做.
- 我还尝试将
.agg()
与其他选项一起使用,但没有成功.
- I also tried to use
.agg()
with different options but didn't get success.
你们中的一个有什么主意吗?
Does one of you would have any idea?
非常感谢:)
推荐答案
使用groupby
和agg
,并通过调用Series.unique
仅聚合唯一值:
Use groupby
and agg
, and aggregate only unique values by calling Series.unique
:
df.astype(str).groupby('prop1').agg(lambda x: ','.join(x.unique()))
prop2 prop3 prop4
prop1
K20 12,1,66 travis,leo 10.0,4.0
L30 3,54,11,10 bob,john 11.2,10.0
df.astype(str).groupby('prop1', sort=False).agg(lambda x: ','.join(x.unique()))
prop2 prop3 prop4
prop1
L30 3,54,11,10 bob,john 11.2,10.0
K20 12,1,66 travis,leo 10.0,4.0
如果处理NaN很重要,请提前致电fillna
:
import re
df.fillna('').astype(str).groupby('prop1').agg(
lambda x: re.sub(',+', ',', ','.join(x.unique()))
)
prop2 prop3 prop4
prop1
K20 12,1,66 travis,leo 10.0,4.0
L30 3,54,11,10 bob,john 11.2,10.0
这篇关于使用Pandas GroupBy从多个列中聚合唯一值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!