将行作为groupby操作的结果插入原始数据框 [英] Insert rows as a result of a groupby operation into the original dataframe

查看:264
本文介绍了将行作为groupby操作的结果插入原始数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

例如,我有一个熊猫数据框如下:

  col_1 col_2 col_3 col_4 
a X 5 1
a Y 3 2
a Z 6 4
b X 7 8
b Y 4 3
b Z 6 5

我想为col_1中的每个值添加col_3和col_4(以及更多列)中与col_2对应的X和Z的值,并创建一个新行与这些值。所以输出如下:

  col_1 col_2 col_3 col_4 
a X 5 1
a Y 3 2
a Z 6 4
a新11 5
b X 7 8
b Y 4 3
b Z 6 5
b新13 13

另外,col_1中可能有更多的值需要相同的处理,所以我不能明确地引用'a'和'b 。我试图使用groupby('col_1')和apply()的组合,但是我无法使其工作。我已经足够接近下面的内容了,但我无法在col_2中添加新,并将原始值(a或b等)保留在col_1中。

  df.append(df [(df ['col_2'] =='X')|(df ['col_2'] =='Z')]。groupby ('col_1')。mean())

谢谢。
<如果您可以保证 X Z

在组中只出现一次,您可以使用 groupby pd.concat 操作:

  new = df [df.col_2.isin(['X','Z'])] \ 
.groupby(['' ([df,new])。sort_values(col_1'],as_index = False).sum()\
.assign(col_2 ='NEW')

df = pd.concat 'col_1')

df
col_1 col_2 col_3 col_4
0 a X 5 1
1 a Y 3 2
2 a Z 6 4
0 a新11 5
3 b x 7 8
4 b Y 4 3
5 b Z 6 5
1 b新13 13


For example, I have a pandas dataframe as follows:

col_1   col_2   col_3  col_4
a       X        5      1
a       Y        3      2
a       Z        6      4
b       X        7      8
b       Y        4      3
b       Z        6      5

And I want to, for each value in col_1, add the values in col_3 and col_4 (and many more columns) that correspond to X and Z from col_2 and create a new row with these values. So the output would be as below:

col_1   col_2   col_3  col_4 
a       X        5      1
a       Y        3      2
a       Z        6      4
a       NEW      11     5
b       X        7      8
b       Y        4      3
b       Z        6      5
b       NEW      13     13

Also, there could be more values in col_1 that will need the same treatment, so I can't explicitly reference 'a' and 'b'. I attempted to use a combination of groupby('col_1') and apply(), but I couldn't get it to work. I'm close enough with the below, but I can't get it to put 'NEW' in col_2 and to keep the original value (a or b, etc.) in col_1.

df.append(df[(df['col_2'] == 'X') | (df['col_2'] == 'Z')].groupby('col_1').mean())

Thanks.

解决方案

If you can guarantee that X and Z appear only once in a group, you can use a groupby and pd.concat operation:

new = df[df.col_2.isin(['X', 'Z'])]\
      .groupby(['col_1'], as_index=False).sum()\
      .assign(col_2='NEW')

df = pd.concat([df, new]).sort_values('col_1')

df
  col_1 col_2  col_3  col_4
0     a     X      5      1
1     a     Y      3      2
2     a     Z      6      4
0     a   NEW     11      5
3     b     X      7      8
4     b     Y      4      3
5     b     Z      6      5
1     b   NEW     13     13

这篇关于将行作为groupby操作的结果插入原始数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆