基于唯一行的Concat python数据框 [英] Concat python dataframes based on unique rows
本文介绍了基于唯一行的Concat python数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我的数据框看起来像:
df1
user_id username firstname lastname
123 abc abc abc
456 def def def
789 ghi ghi ghi
df2
user_id username firstname lastname
111 xyz xyz xyz
456 def def def
234 mnp mnp mnp
现在我想要一个输出数据帧,例如
Now I want a output dataframe like
user_id username firstname lastname
123 abc abc abc
456 def def def
789 ghi ghi ghi
111 xyz xyz xyz
234 mnp mnp mnp
由于user_id 456
在两个数据帧中都是通用的.我已经在user_id groupby(['user_id'])
上尝试过groupby.但是看起来groupby后面必须跟一些aggregation
,我在这里不需要.
As user_id 456
is common across both the dataframes. I have tried groupby on user_id groupby(['user_id'])
. But looks like groupby need to be followed by some aggregation
which I don't want here.
推荐答案
使用 drop_duplicates
:
df = pd.concat([df1, df2]).drop_duplicates('user_id').reset_index(drop=True)
print (df)
user_id username firstname lastname
0 123 abc abc abc
1 456 def def def
2 789 ghi ghi ghi
3 111 xyz xyz xyz
4 234 mnp mnp mnp
使用groupby
和聚合first
的解决方案比较慢:
Solution with groupby
and aggregate first
is slowier:
df = pd.concat([df1, df2]).groupby('user_id', as_index=False, sort=False).first()
print (df)
user_id username firstname lastname
0 123 abc abc abc
1 456 def def def
2 789 ghi ghi ghi
3 111 xyz xyz xyz
4 234 mnp mnp mnp
使用 boolean indexing
的另一种解决方案 numpy.in1d
:
df = pd.concat([df1, df2[~np.in1d(df2['user_id'], df1['user_id'])]], ignore_index=True)
print (df)
user_id username firstname lastname
0 123 abc abc abc
1 456 def def def
2 789 ghi ghi ghi
3 111 xyz xyz xyz
4 234 mnp mnp mnp
这篇关于基于唯一行的Concat python数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文