如何在 pandas 上堆叠这一特定行? [英] How to stack this specific row on pandas?

查看:39
本文介绍了如何在 pandas 上堆叠这一特定行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

考虑下面的df

df_dict = {'name': {0: 'john',1:'约翰',4:'达芙妮'},地址":{0:约翰的地址",1:'约翰的地址',4:'达芙妮地址'},'phonenum1':{0:7870395,1:7870450,4:7373209},'phonenum2':{0:无,1:123450,4:无},'phonenum3':{0:无,1:123456,4:无}}df = pd.DataFrame(df_dict)姓名 地址 phonenum1 phonenum2 phonenum30 john johns 地址 7870395 NaN NaN1 john johns 地址 7870450 123450.0 123456.04 daphne daphne 地址 7373209 NaN NAN

如何解开 phonenum 数据的堆栈,以便找到相同 full_name 和地址的条目的输出如下所示?

<预><代码>姓名 地址 phonenum1 phonenum2 phonenum3 phonenum40 约翰约翰斯地址 7870395 7870450 123450.0 123456.04 daphne daphne 地址 7373209 NaN NaN NaN

解决方案

你可以使用 set_indexstack,然后使用 groupby.cumcount> 按名称和地址获取后面的列名,然后 unstack 并执行一些 reset_indexrename_axis 以进行修饰.

df_ = (df.set_index(['name', 'address']).堆().reset_index(级别=-1).assign(cc=lambda x: x.groupby(level=['name', 'address']).cumcount()+1).set_index('cc', append=True)[0].unstack().add_prefix('phonenum').reset_index().rename_axis(columns=None))打印 (df_)姓名 地址 phonenum1 phonenum2 phonenum3 phonenum40 约翰约翰地址 7870395.0 7870450.0 123450.0 123456.01 daphne daphne 地址 7373209.0 NaN NaN NaN

代码的方式是,您可以在关闭括号之前从第二行到最后一行注释,然后逐行取消注释以查看每次发生的情况.

Consider the below df

df_dict = {'name': {0: '  john',
  1: '  john',
  4: ' daphne '},
 'address': {0: 'johns address',
  1: 'johns address',
  4: 'daphne address'},
 'phonenum1': {0: 7870395,
  1: 7870450,
  4: 7373209},
 'phonenum2': {0: None, 1: 123450 , 4: None},
 'phonenum3': {0: None, 1: 123456, 4: None}
}

df = pd.DataFrame(df_dict)

    name    address       phonenum1     phonenum2   phonenum3
0   john    johns address   7870395     NaN         NaN
1   john    johns address   7870450     123450.0    123456.0
4   daphne  daphne address  7373209     NaN         NAN

How to unstack the phonenum data so the output is presented as below for entries where the same full_name and address is found?


    name     address       phonenum1     phonenum2   phonenum3    phonenum4
0   john    johns address   7870395      7870450     123450.0     123456.0
4   daphne  daphne address  7373209        NaN        NaN           NaN

解决方案

you can do it using set_index and stack, then groupby.cumcount per name and address to get the later column names, then unstack and do some reset_index and rename_axis for cosmetic.

df_ = (df.set_index(['name', 'address'])
         .stack()
         .reset_index(level=-1)
         .assign(cc=lambda x: x.groupby(level=['name', 'address']).cumcount()+1)
         .set_index('cc', append=True)
         [0].unstack()
         .add_prefix('phonenum')
         .reset_index()
         .rename_axis(columns=None)
      )
print (df_)
       name         address  phonenum1  phonenum2  phonenum3  phonenum4
0      john   johns address  7870395.0  7870450.0   123450.0   123456.0
1   daphne   daphne address  7373209.0        NaN        NaN        NaN

The way the code is, you can comment from second line to the last one before closing the parenthesis, then un-comment each line one after the other to see what is happening each time.

这篇关于如何在 pandas 上堆叠这一特定行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆