合并两个数据框后的NaN [英] NaNs after merging two dataframes
问题描述
我有两个如下数据框:
df1
id name
-------------------------
0 43 c
1 23 t
2 38 j
3 9 s
df2
user id
--------------------------------------------------
0 222087 27,26
1 1343649 6,47,17
2 404134 18,12,23,22,27,43,38,20,35,1
3 1110200 9,23,2,20,26,47,37
我想将df2中的所有id分成多行,然后将结果数据帧连接到"id"上的df1.
I want to split all the ids in df2 into multiple rows and join the resultant dataframe to df1 on "id".
我执行以下操作:
b = pd.DataFrame(df2['id'].str.split(',').tolist(), index=df2.user_id).stack()
b = b.reset_index()[[0, 'user_id']] # var1 variable is currently labeled 0
b.columns = ['Item_id', 'user_id']
当我尝试合并时,我在结果数据框中得到NaN.
When I try to merge, I get NaNs in the resultant dataframe.
pd.merge(b, df1, on = "id", how="left")
id user name
-------------------------------------
0 27 222087 NaN
1 26 222087 NaN
2 6 1343649 NaN
3 47 1343649 NaN
4 17 1343649 NaN
因此,我尝试执行以下操作:
So, I tried doing the following:
b['name']=np.nan
for i in range(0, len(df1)):
b['name'][(b['id'] == df1['id'][i])] = df1['name'][i]
它仍然给出与上面相同的结果.我对可能导致这种情况的原因感到困惑,因为我确信它们都可以正常工作! 任何帮助将非常感激!
It still gives the same result as above. I am confused as to what could cause this because I am sure both of them should work! Any help would be much appreciated!
我在SO上阅读过类似的帖子,但似乎没有一个具体的答案.我也不确定这是否与编码完全无关.
I read similar posts on SO but none seemed to have a concrete answer. I am also not sure if this is not at all related to coding or not.
提前谢谢!
推荐答案
问题是您需要将df2
中的列id
转换为int
,因为string
函数的输出始终是string
,如果适用于数字.
Problem is you need convert column id
in df2
to int
, because output of string
functions is always string
, also if works with numeric.
df2.id = df2.id.astype(int)
另一种解决方案是将df1.id
转换为string
:
Another solution is convert df1.id
to string
:
df1.id = df1.id.astype(str)
并获取NaN
s,因为没有匹配项-str
的值与int
的值不匹配.
And get NaN
s because no match - str
values doesnt match with int
values.
这篇关于合并两个数据框后的NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!