合并数据框与聚合 [英] Merge dataframe with aggregation

查看:71
本文介绍了合并数据框与聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想聚合一个数据框-获取每个组的第一行,并同时连接'upc'列中的值:

I want to aggregate a dataframe - to get the first row of every group and simultaneously to concatenate the values in column 'upc':

df = pd.DataFrame({
    'id1': [1, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 6, 6, 6, 7, 7],
    'id2': [11, 22, 11, 11, 22, 33, 33, 33, 33, 44, 44, 55, 66, 66, 22, 77, 77],
    'value1': ["1first", "1second", "1third",
               "2first", "2second",
               "3first", "3second", "3third", "3fourth",
               "4first", "4second",
               "5first",
               "6first", "6second", "6third",
               "7first", "7second"],
    'upc': [str(x) for x in range(100, 117)]
})
firsts_df = df.groupby(['id1', 'id2']).first()
concat_upcs_df = df[['id1', 'id2', 'upc']].groupby(['id1', 'id2']).apply(lambda x: '|'.join(x.upc))
firsts_df.merge(concat_upcs_df, how='inner',left_on=['id1', 'id2'], right_on=['id1', 'id2'])

这将导致此错误:

ValueError:无法将DataFrame与类型类为'pandas.core.series.Series'的实例合并

ValueError: can not merge DataFrame with instance of type class 'pandas.core.series.Series'

如何将聚合结果与数据框合并? 我可以用更少的成本得到相同的结果吗?

How can I merge an aggregation result with a dataframe? could I get same result with less costly operation?

推荐答案

我认为您需要as_index=Falsefirst并为DataFrame s添加reset_index()concat_upcs_df:

I think you need as_index=False to first and add reset_index() to concat_upcs_df for DataFrames:

firsts_df = df.groupby(['id1', 'id2'], as_index=False).first()
concat_upcs_df = df[['id1', 'id2', 'upc']].groupby(['id1', 'id2']).apply(lambda x: '|'.join(x.upc)).reset_index(name='val')
firsts_df.merge(concat_upcs_df, how='inner',left_on=['id1', 'id2'], right_on=['id1', 'id2'])
print (df)
   id1  id2  upc   value1              val
0    1   11  100   1first          100|102
1    1   22  101  1second              101
2    2   11  103   2first              103
3    2   22  104  2second              104
4    3   33  105   3first  105|106|107|108
5    4   44  109   4first          109|110
6    5   55  111   5first              111
7    6   22  114   6third              114
8    6   66  112   6first          112|113
9    7   77  115   7first          115|116

您还可以使用 drop_duplicates 代替firstapply而没有lambda,也

You can also use drop_duplicates instead first and apply without lambda, also merge working with on, because left and right joined columns are same:

firsts_df = df.drop_duplicates(['id1', 'id2'])
concat_upcs_df = df.groupby(['id1', 'id2'])['upc'].apply('|'.join).reset_index(name='val')
df = firsts_df.merge(concat_upcs_df, on=['id1', 'id2'])

这篇关于合并数据框与聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆