使用Pandas GroupBy从多个列中聚合唯一值 [英] Aggregate unique values from multiple columns with pandas GroupBy

查看：611 发布时间：2020/5/23 23:38:06 python pandas dataframe unique pandas-groupby

本文介绍了使用Pandas GroupBy从多个列中聚合唯一值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我遇到了无数线程( 1 2

I went into countless threads (1 2 3...) and still I don't find a solution to my problem... I have a dataframe like this:

prop1 prop2 prop3    prop4 
L30   3     bob      11.2
L30   54    bob      10
L30   11    john     10
L30   10    bob      10
K20   12    travis   10 
K20   1     travis   4 
K20   66    leo      10

我想对prop1进行分组，并同时对所有其他列进行汇总，但只能使用唯一值.像这样:

I would like to do a groupby on prop1, AND at the same time, get all the other columns aggregated, but only with unique values. Like that:

prop1  prop2       prop3       prop4
L30    3,54,11,10  bob,john    11.2,10
K20    12,1,66     travis,leo  10,4

我尝试了不同的方法:

df.groupby('prop1')['prop2','prop3','prop4'].apply(np.unique) 返回

AttributeError:"numpy.ndarray"对象没有属性"index" 加上TypeError:Series.name必须是可哈希的类型

AttributeError: 'numpy.ndarray' object has no attribute 'index' PLUS TypeError: Series.name must be a hashable type

也:.apply(lambda x: pd.unique(x.values.ravel()).tolist())这给出了一个列表作为输出，我想要列.

Also: .apply(lambda x: pd.unique(x.values.ravel()).tolist()) which gives a list as output, and I would like columns.

df.groupby('prop1')['prop2','prop3','prop4'].unique()本身不起作用，因为有多个列.

df.groupby('prop1')['prop2','prop3','prop4'].unique() by itself doesn't work because there are multiple columns.

.apply(f)，f为:

def f(df): df['prop2']=df['prop2'].drop_duplicates() df['prop3']=df['prop3'].drop_duplicates() df['prop4']=df['prop4'].drop_duplicates() return df

什么都不做.

我还尝试将.agg()与其他选项一起使用，但没有成功.

I also tried to use .agg() with different options but didn't get success.

你们中的一个有什么主意吗?

Does one of you would have any idea?

非常感谢:)

推荐答案

使用groupby和agg，并通过调用Series.unique仅聚合唯一值:

Use groupby and agg, and aggregate only unique values by calling Series.unique:

df.astype(str).groupby('prop1').agg(lambda x: ','.join(x.unique()))

            prop2       prop3      prop4
prop1                                   
K20       12,1,66  travis,leo   10.0,4.0
L30    3,54,11,10    bob,john  11.2,10.0

df.astype(str).groupby('prop1', sort=False).agg(lambda x: ','.join(x.unique()))

            prop2       prop3      prop4
prop1                                   
L30    3,54,11,10    bob,john  11.2,10.0
K20       12,1,66  travis,leo   10.0,4.0

如果处理NaN很重要，请提前致电fillna:

import re
df.fillna('').astype(str).groupby('prop1').agg(
    lambda x: re.sub(',+', ',', ','.join(x.unique()))
)

            prop2       prop3      prop4
prop1                                   
K20       12,1,66  travis,leo   10.0,4.0
L30    3,54,11,10    bob,john  11.2,10.0

这篇关于使用Pandas GroupBy从多个列中聚合唯一值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Pandas GroupBy从多个列中聚合唯一值 [英] Aggregate unique values from multiple columns with pandas GroupBy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用Pandas GroupBy从多个列中聚合唯一值 [英] Aggregate unique values from multiple columns with pandas GroupBy

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭