合并两个数据框后的NaN [英] NaNs after merging two dataframes

查看:83
本文介绍了合并两个数据框后的NaN的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个如下数据框:

df1

         id         name
-------------------------
0        43          c
1        23          t
2        38          j
3         9          s

df2

          user        id
--------------------------------------------------
0         222087      27,26
1         1343649     6,47,17
2         404134      18,12,23,22,27,43,38,20,35,1
3         1110200     9,23,2,20,26,47,37

我想将df2中的所有id分成多行,然后将结果数据帧连接到"id"上的df1.

I want to split all the ids in df2 into multiple rows and join the resultant dataframe to df1 on "id".

我执行以下操作:

b = pd.DataFrame(df2['id'].str.split(',').tolist(), index=df2.user_id).stack()
b = b.reset_index()[[0, 'user_id']] # var1 variable is currently labeled 0
b.columns = ['Item_id', 'user_id'] 

当我尝试合并时,我在结果数据框中得到NaN.

When I try to merge, I get NaNs in the resultant dataframe.

pd.merge(b, df1, on = "id", how="left")

              id       user      name
-------------------------------------
0              27      222087     NaN
1              26      222087     NaN
2              6      1343649     NaN
3              47     1343649     NaN
4              17     1343649     NaN

因此,我尝试执行以下操作:

So, I tried doing the following:

b['name']=np.nan
for i in range(0, len(df1)):
    b['name'][(b['id'] == df1['id'][i])] = df1['name'][i]

它仍然给出与上面相同的结果.我对可能导致这种情况的原因感到困惑,因为我确信它们都可以正常工作! 任何帮助将非常感激!

It still gives the same result as above. I am confused as to what could cause this because I am sure both of them should work! Any help would be much appreciated!

我在SO上阅读过类似的帖子,但似乎没有一个具体的答案.我也不确定这是否与编码完全无关.

I read similar posts on SO but none seemed to have a concrete answer. I am also not sure if this is not at all related to coding or not.

提前谢谢!

推荐答案

问题是您需要将df2中的列id转换为int,因为string函数的输出始终是string,如果适用于数字.

Problem is you need convert column id in df2 to int, because output of string functions is always string, also if works with numeric.

df2.id = df2.id.astype(int)

另一种解决方案是将df1.id转换为string:

Another solution is convert df1.id to string:

df1.id = df1.id.astype(str)

并获取NaN s,因为没有匹配项-str的值与int的值不匹配.

And get NaNs because no match - str values doesnt match with int values.

这篇关于合并两个数据框后的NaN的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆