Pandas:基于条件的宽集数据框 [英] Pandas: Wideset dataframe based on condition

查看:37
本文介绍了Pandas:基于条件的宽集数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

每当有重复匹配的 time + image 列时,我都想加宽数据框.

I would like to wideset the data frame every time there's a duplicate matching time + image column.

最小可重复示例:

这个数据框:

    Time                Image   YorN    Result      Name    Value
    2020-11-21 13:40:56 W402    Y       ACCEPTED    David   2.11
    2020-11-21 13:41:03 W403    Y       ACCEPTED    David   1.04
    2020-11-21 13:45:16 W404    Y       REJECTED    David   18.31
    2020-11-21 13:45:16 W404    N       ACCEPTED    Super   80.69
    2020-11-21 14:01:01 W405    Y       ACCEPTED    Harry   1.41
    2020-11-21 14:01:07 NaN     nan     NaN         NaN     NaN

第 3 行和第 4 行都具有相同的 timeimage id,因此该行将被设置为宽设置.

Row 3 and 4 both have the same time and image id and hence this row would be widesetted.

所需的数据帧:

    Time                Image   YorN    Result      Name    Value   Result2  Name2  Value2
    2020-11-21 13:40:56 W402    Y       ACCEPTED    David   2.11    NaN      NaN    NaN
    2020-11-21 13:41:03 W403    Y       ACCEPTED    David   1.04    NaN      NaN    NaN
    2020-11-21 13:45:16 W404    Y       REJECTED    David   18.31   ACCEPTED Super  80.69
    2020-11-21 14:01:01 W405    Y       ACCEPTED    Harry   1.41    NaN      NaN    NaN
    2020-11-21 14:01:07 NaN     nan     NaN         NaN     NaN     NaN      NaN    NaN

推荐答案

使用 DataFrame.set_indexGroupBy.cumcount 用于重复列的计数器,通过 DataFrame.unstack,按DataFrame.sort_index,展平 MultiIndex 并最后将 MultiIndex 转换为列:

Use DataFrame.set_index with GroupBy.cumcount for counter of repeated columns, reshape by DataFrame.unstack, sorting index by DataFrame.sort_index, flatten MultiIndex and last convert MultiIndex to columns:

df = (df.set_index(['Time','Image', df.groupby(['Time','Image']).cumcount().add(1)])
        .unstack()
        .sort_index(level=1, axis=1, sort_remaining=False))
df.columns = df.columns.map(lambda x: f'{x[0]}{x[1]}')
df = df.reset_index()
print (df)
                  Time Image YorN1   Result1  Name1  Value1 YorN2   Result2  \
0  2020-11-21 13:40:56  W402     Y  ACCEPTED  David    2.11   NaN       NaN   
1  2020-11-21 13:41:03  W403     Y  ACCEPTED  David    1.04   NaN       NaN   
2  2020-11-21 13:45:16  W404     Y  REJECTED  David   18.31     N  ACCEPTED   
3  2020-11-21 14:01:01  W405     Y  ACCEPTED  Harry    1.41   NaN       NaN   
4  2020-11-21 14:01:07   NaN   NaN       NaN    NaN     NaN   NaN       NaN   

   Name2  Value2  
0    NaN     NaN  
1    NaN     NaN  
2  Super   80.69  
3    NaN     NaN  
4    NaN     NaN  

这篇关于Pandas:基于条件的宽集数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆