填写"na"具有唯一"na"的值 pandas 合并时的标识符 [英] Fill the "na" values with unique "na" identifier when doing pandas merge

查看:88
本文介绍了填写"na"具有唯一"na"的值 pandas 合并时的标识符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要合并两个熊猫数据框.

I want to merge two pandas dataframe.

df1 = 
A   B
2   11
2   13
2   15
2   19
2   25
2   35
2   41
2   47
2   46
2   51
3   9
3   15
3   17
3   23
3   25
3   29
5   4
5   23
5   28

与另一个数据框.

   df2 = 
A   B    C
2   11   abc
2   13   cdd
2   35   cdd
2   41   cdd
2   47   cdd
3   9   cdd
3   15   cdd
3   17   cdd
3   23   cdd

两个数据帧均按"A"和"B"排序.我想通过columns['A', 'B']合并;因此对于缺少数据的"C"列,我想用na填充它们,但是对于每个na丢失的块都用na_uniqueNumber填充.

Both dataframes are sorted by "A" and then "B". I want to merge by columns['A', 'B']; so for column "C" where the data are missing I want to fill them by na, but with na_uniqueNumber for each missing block of na.

如何更新此合并方法:

data_frames = [df1, df2]
df_update = reduce(lambda left,right: pd.merge(
    left, right, on=['A', 'B'], how='outer'), data_frames).fillna('na')

注意:在存在其他列的情况下,代码仅在"C"中用唯一值更新na.

Note: The code should update na with unique values only in "C" in the situation other column are present.

预期输出:

   df2 = 
A   B    C
2   11   abc
2   13   cdd
2   15   na_01
2   19   na_01 
2   25   na_01  
2   35   cdd
2   41   cdd
2   47   cdd
2   46   na_02
2   51   na_02
3   9   cdd
3   15   cdd
3   17   cdd
3   23   cdd
3   25   na_03
3   29   na_03
5   4   na_04
5   23   na_04
5   28   na_04

谢谢

推荐答案

IIUC

New = df_update[df_update.C == 'na']

s=New.reset_index().groupby('A').apply(lambda x : x['index'].diff().ne(1)).cumsum()

df_update.loc[df_update.C == 'na','C']+='_'+s.astype(str).str.pad(2,fillchar='0').values
df_update
Out[124]: 
    A   B      C
0   2  11    abc
1   2  13    cdd
2   2  15  na_01
3   2  19  na_01
4   2  25  na_01
5   2  35    cdd
6   2  41    cdd
7   2  47    cdd
8   2  46  na_02
9   2  51  na_02
10  3   9    cdd
11  3  15    cdd
12  3  17    cdd
13  3  23    cdd
14  3  25  na_03
15  3  29  na_03
16  5   4  na_04
17  5  23  na_04
18  5  28  na_04

这篇关于填写"na"具有唯一"na"的值 pandas 合并时的标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆