填写"na"具有唯一"na"的值 pandas 合并时的标识符 [英] Fill the "na" values with unique "na" identifier when doing pandas merge
本文介绍了填写"na"具有唯一"na"的值 pandas 合并时的标识符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我要合并两个熊猫数据框.
I want to merge two pandas dataframe.
df1 =
A B
2 11
2 13
2 15
2 19
2 25
2 35
2 41
2 47
2 46
2 51
3 9
3 15
3 17
3 23
3 25
3 29
5 4
5 23
5 28
与另一个数据框.
df2 =
A B C
2 11 abc
2 13 cdd
2 35 cdd
2 41 cdd
2 47 cdd
3 9 cdd
3 15 cdd
3 17 cdd
3 23 cdd
两个数据帧均按"A"和"B"排序.我想通过columns['A', 'B']
合并;因此对于缺少数据的"C"列,我想用na
填充它们,但是对于每个na
丢失的块都用na_uniqueNumber
填充.
Both dataframes are sorted by "A" and then "B". I want to merge by columns['A', 'B']
; so for column "C" where the data are missing I want to fill them by na
, but with na_uniqueNumber
for each missing block of na
.
如何更新此合并方法:
data_frames = [df1, df2]
df_update = reduce(lambda left,right: pd.merge(
left, right, on=['A', 'B'], how='outer'), data_frames).fillna('na')
注意:在存在其他列的情况下,代码仅在"C"中用唯一值更新na
.
Note: The code should update na
with unique values only in "C" in the situation other column are present.
预期输出:
df2 =
A B C
2 11 abc
2 13 cdd
2 15 na_01
2 19 na_01
2 25 na_01
2 35 cdd
2 41 cdd
2 47 cdd
2 46 na_02
2 51 na_02
3 9 cdd
3 15 cdd
3 17 cdd
3 23 cdd
3 25 na_03
3 29 na_03
5 4 na_04
5 23 na_04
5 28 na_04
谢谢
推荐答案
IIUC
New = df_update[df_update.C == 'na']
s=New.reset_index().groupby('A').apply(lambda x : x['index'].diff().ne(1)).cumsum()
df_update.loc[df_update.C == 'na','C']+='_'+s.astype(str).str.pad(2,fillchar='0').values
df_update
Out[124]:
A B C
0 2 11 abc
1 2 13 cdd
2 2 15 na_01
3 2 19 na_01
4 2 25 na_01
5 2 35 cdd
6 2 41 cdd
7 2 47 cdd
8 2 46 na_02
9 2 51 na_02
10 3 9 cdd
11 3 15 cdd
12 3 17 cdd
13 3 23 cdd
14 3 25 na_03
15 3 29 na_03
16 5 4 na_04
17 5 23 na_04
18 5 28 na_04
这篇关于填写"na"具有唯一"na"的值 pandas 合并时的标识符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文