大 pandas 。 Groupby多列,然后将计算列附加到现有数据框 [英] Pandas. Groupby multiple columns, then attach a calculated column to an existing dataframe

查看:754
本文介绍了大 pandas 。 Groupby多列,然后将计算列附加到现有数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这与附上一个计算的列到现有的数据框,然而当您通过一个以上的列进行分组时,此处发布的解决方案不起作用。



我有一个数据框df:

  id国家来源
------------------- ------
1 1 1
1 2 1
1 2 2
1 3 1
2 1 1



我想添加一个包含该列表(id,country)的列的列:

  df ['source_list'] = df.groupby(['id','country'])['source']。apply(lambda x:list(set(x .tolist())))


国家来源source_list
----------------------- ---------------
1 1 1 [1]
1 2 1 [1,2]
1 2 2 [1,2]
1 3 1 [1]
2 1 1 [1]

这一行打印得很好:

  df.groupby(['id','country '))['source']。apply(lambda x:list(set(x.tolist())))

但我无法将它作为df中的新列添加 - 我收到错误消息:

  TypeError :带有索引的插入列的不兼容索引

上面链接的SO问题中提出的解决方案doesn'我在这里工作。



我使用熊猫0.14

解决方案

可能不是最优雅的方式,但可以将分组结果与初始DataFrame合并:

 >>> df1 = df.groupby(['id','country'])['source'] .application(lambda x:x.tolist()).setup_index()
>>> df1
国家来源
0 1 1 [1.0]
1 1 2 [1.0,2.0]
2 1 3 [1.0]
3 2 1 [1.0]
>>> df2 = df [['id','country']]
>>> df2
id国家
1 1 1
2 1 2
3 1 2
4 1 3
5 2 1
>>> >国家来源
0 1 1 [1.0]
1 1 2 [1.0,2.0]
2 1 2 [1.0,2.0]
3 1 3 [1.0]
4 2 1 [1.0]


This is essentially the same thing as in Attach a calculated column to an existing dataframe, however the solution posted here doesn't work when you groupby more than one column.

I have a dataframe df:

id    country     source
-------------------------
1     1           1
1     2           1
1     2           2
1     3           1
2     1           1          

I want to add a column with the list of sources for that (id,country):

df['source_list'] = df.groupby(['id','country'])['source'].apply(lambda x: list(set(x.tolist())))


id    country     source   source_list
--------------------------------------
1     1           1           [1]
1     2           1           [1,2]
1     2           2           [1,2]
1     3           1           [1]
2     1           1           [1]

This line prints just fine:

df.groupby(['id','country'])['source'].apply(lambda x: list(set(x.tolist())))

But I can't attach it as a new column in the df - I get the error:

TypeError: incompatible index of inserted column with frame index

And the solution suggested in the linked SO question above doesn't work here either.

I'm using pandas 0.14

解决方案

May be not the most elegant way, but you can merge grouped result with the initial DataFrame:

>>> df1 = df.groupby(['id','country'])['source'].apply(lambda x: x.tolist()).reset_index()
>>> df1
  id  country      source
0  1        1       [1.0]
1  1        2  [1.0, 2.0]
2  1        3       [1.0]
3  2        1       [1.0]
>>> df2 = df[['id', 'country']]
>>> df2
  id  country
1  1        1
2  1        2
3  1        2
4  1        3
5  2        1
>>> pd.merge(df1, df2, on=['id', 'country'])
  id  country      source
0  1        1       [1.0]
1  1        2  [1.0, 2.0]
2  1        2  [1.0, 2.0]
3  1        3       [1.0]
4  2        1       [1.0]

这篇关于大 pandas 。 Groupby多列,然后将计算列附加到现有数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆