基于唯一行的Concat python数据框 [英] Concat python dataframes based on unique rows

查看:80
本文介绍了基于唯一行的Concat python数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框看起来像:

df1

user_id    username firstname lastname 
 123         abc      abc       abc
 456         def      def       def 
 789         ghi      ghi       ghi

df2

user_id     username  firstname lastname
 111         xyz       xyz       xyz
 456         def       def       def
 234         mnp       mnp        mnp

现在我想要一个输出数据帧,例如

Now I want a output dataframe like

 user_id    username firstname lastname 
 123         abc      abc       abc
 456         def      def       def 
 789         ghi      ghi       ghi
 111         xyz       xyz       xyz
 234         mnp       mnp        mnp

由于user_id 456在两个数据帧中都是通用的.我已经在user_id groupby(['user_id'])上尝试过groupby.但是看起来groupby后面必须跟一些aggregation,我在这里不需要.

As user_id 456 is common across both the dataframes. I have tried groupby on user_id groupby(['user_id']) . But looks like groupby need to be followed by some aggregation which I don't want here.

推荐答案

使用 drop_duplicates :

df = pd.concat([df1, df2]).drop_duplicates('user_id').reset_index(drop=True)
print (df)
   user_id username firstname lastname
0      123      abc       abc      abc
1      456      def       def      def
2      789      ghi       ghi      ghi
3      111      xyz       xyz      xyz
4      234      mnp       mnp      mnp

使用groupby和聚合first的解决方案比较慢:

Solution with groupby and aggregate first is slowier:

df = pd.concat([df1, df2]).groupby('user_id', as_index=False, sort=False).first()
print (df)
   user_id username firstname lastname
0      123      abc       abc      abc
1      456      def       def      def
2      789      ghi       ghi      ghi
3      111      xyz       xyz      xyz
4      234      mnp       mnp      mnp

使用 boolean indexing 的另一种解决方案 numpy.in1d :

df = pd.concat([df1, df2[~np.in1d(df2['user_id'], df1['user_id'])]], ignore_index=True)
print (df)
   user_id username firstname lastname
0      123      abc       abc      abc
1      456      def       def      def
2      789      ghi       ghi      ghi
3      111      xyz       xyz      xyz
4      234      mnp       mnp      mnp

这篇关于基于唯一行的Concat python数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆