Python pandas groupby条件连接字符串成多列 [英] Python pandas groupby conditional concatenate strings into multiple columns

查看:275
本文介绍了Python pandas groupby条件连接字符串成多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图按一列上的数据帧进行分组,将每组中一行的几列保持不变,并根据一列的值将其他行中的字符串连接成多列.这是一个例子...

I am trying to group by a dataframe on one column, keeping several columns from one row in each group and concatenating strings from the other rows into multiple columns based on the value of one column. Here is an example...

df = pd.DataFrame({'test' : ['a','a','a','a','a','a','b','b','b','b'],
     'name' : ['aa','ab','ac','ad','ae','ba','bb','bc','bd','be'],
     'amount' : [1, 2, 3, 4, 5, 6, 7, 8, 9, 9.5],
     'role' : ['x','y','y','x','x','z','y','y','z','y']})

df

      amount    name    role    test
0        1.0    aa      x       a
1        2.0    ab      y       a
2        3.0    ac      y       a
3        4.0    ad      x       a
4        5.0    ae      x       a
5        6.0    ba      z       a
6        7.0    bb      y       b
7        8.0    bc      y       b
8        9.0    bd      z       b
9        9.5    be      y       b

我想进行分组测试,在role ='z'时保留名称和金额,创建一个列(将其命名为X)将在role ='x'时的name值连接起来,并在另一列(将其称为Y)在role ='y'时连接name的值. [连接值以';分隔; ']对于每个测试值,角色='x'的行数可能为零至零,角色='y'的行数可能为零至零,而角色='z'的行可能为零.对于X和Y,如果在该测试中没有该角色的行,则这些字段可以为null.对于角色='x'或'y'的所有行,将删除金额值.所需的输出将类似于:

I would like to groupby on test, retain name and amount when role = 'z', create a column (let's call it X) that concatenates the values of name when role = 'x' and another column (let's call it Y) that concatenates the values of name when role = 'y'. [Concatenated values separated by '; '] There could be zero to many rows with role = 'x', zero to many rows with role = 'y' and one row with role = 'z' per value of test. For X and Y, these can be null if there are no rows for that role for that test. The amount value is dropped for all rows with role = 'x' or 'y'. The desired output would be something like:

     test   name     amount        X              Y
0    a      ba          6.0        aa; ad; ae     ab; ac
1    b      bd          9.0        None           bb; bc; be

对于串联部分,我找到了x.ix[x.role == 'x', X] = "{%s}" % '; '.join(x['name']),我也许可以重复y.我按照name = x[x.role == 'z'].name.first()的名称和数量尝试了一些方法.我也尝试过定义函数和lambda函数的两条路径,但均未成功.赞赏任何想法.

For the concatenating part, I found x.ix[x.role == 'x', X] = "{%s}" % '; '.join(x['name']), which I might be able to repeat for y. I tried a few things along the lines of name = x[x.role == 'z'].name.first() for name and amount. I also tried going down both paths of a defined function and a lambda function without success. Appreciate any thoughts.

推荐答案

# set index and get crossection where test is 'z'
z = df.set_index(['test', 'role']).xs('z', level='role')
# get rid of 'z' rows and group by 'test' and 'role' to join names
xy = df.query('role != "z"').groupby(['test', 'role'])['name'].apply(';'.join).unstack()
# make columns of xy upper case
xy.columns = xy.columns.str.upper()

pd.concat([z, xy], axis=1).reset_index()

这篇关于Python pandas groupby条件连接字符串成多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆