Pandas:基于条件的宽集数据框 [英] Pandas: Wideset dataframe based on condition
本文介绍了Pandas:基于条件的宽集数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
每当有重复匹配的 time
+ image
列时,我都想加宽数据框.
I would like to wideset the data frame every time there's a duplicate matching time
+ image
column.
最小可重复示例:
这个数据框:
Time Image YorN Result Name Value
2020-11-21 13:40:56 W402 Y ACCEPTED David 2.11
2020-11-21 13:41:03 W403 Y ACCEPTED David 1.04
2020-11-21 13:45:16 W404 Y REJECTED David 18.31
2020-11-21 13:45:16 W404 N ACCEPTED Super 80.69
2020-11-21 14:01:01 W405 Y ACCEPTED Harry 1.41
2020-11-21 14:01:07 NaN nan NaN NaN NaN
第 3 行和第 4 行都具有相同的 time
和 image
id,因此该行将被设置为宽设置.
Row 3 and 4 both have the same time
and image
id and hence this row would be widesetted.
所需的数据帧:
Time Image YorN Result Name Value Result2 Name2 Value2
2020-11-21 13:40:56 W402 Y ACCEPTED David 2.11 NaN NaN NaN
2020-11-21 13:41:03 W403 Y ACCEPTED David 1.04 NaN NaN NaN
2020-11-21 13:45:16 W404 Y REJECTED David 18.31 ACCEPTED Super 80.69
2020-11-21 14:01:01 W405 Y ACCEPTED Harry 1.41 NaN NaN NaN
2020-11-21 14:01:07 NaN nan NaN NaN NaN NaN NaN NaN
推荐答案
使用 DataFrame.set_index
和 GroupBy.cumcount
用于重复列的计数器,通过 DataFrame.unstack
,按DataFrame.sort_index
,展平 MultiIndex
并最后将 MultiIndex
转换为列:
Use DataFrame.set_index
with GroupBy.cumcount
for counter of repeated columns, reshape by DataFrame.unstack
, sorting index by DataFrame.sort_index
, flatten MultiIndex
and last convert MultiIndex
to columns:
df = (df.set_index(['Time','Image', df.groupby(['Time','Image']).cumcount().add(1)])
.unstack()
.sort_index(level=1, axis=1, sort_remaining=False))
df.columns = df.columns.map(lambda x: f'{x[0]}{x[1]}')
df = df.reset_index()
print (df)
Time Image YorN1 Result1 Name1 Value1 YorN2 Result2 \
0 2020-11-21 13:40:56 W402 Y ACCEPTED David 2.11 NaN NaN
1 2020-11-21 13:41:03 W403 Y ACCEPTED David 1.04 NaN NaN
2 2020-11-21 13:45:16 W404 Y REJECTED David 18.31 N ACCEPTED
3 2020-11-21 14:01:01 W405 Y ACCEPTED Harry 1.41 NaN NaN
4 2020-11-21 14:01:07 NaN NaN NaN NaN NaN NaN NaN
Name2 Value2
0 NaN NaN
1 NaN NaN
2 Super 80.69
3 NaN NaN
4 NaN NaN
这篇关于Pandas:基于条件的宽集数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文