Pandas:基于公共列连接两个数据框的最佳方式 [英] Pandas: Best way to join two dataframes based on a common column
本文介绍了Pandas:基于公共列连接两个数据框的最佳方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我知道这是一个基本问题.但是,请听我说.
I know this is a basic question. But, please hear me out.
我有以下数据框:
In [722]: m1
Out[722]:
Person_id Evidence_14 Feature_14
0 100 90.0 True
1 101 NaN NaN
2 102 91.0 True
3 103 NaN NaN
4 104 94.0 True
5 105 NaN NaN
6 106 NaN NaN
In [721]: m3
Out[721]:
Person_id Evidence_14 Feature_14
0 100 NaN NaN
1 101 99.0 False
2 102 NaN NaN
3 103 95.0 False
4 104 NaN NaN
5 105 NaN NaN
6 106 93.0 False
预期输出:
In [734]: z
Out[734]:
Person_id Evidence_14 Feature_14
0 100 90.0 True
1 101 99.0 False
2 102 91.0 True
3 103 95.0 False
4 104 94.0 True
5 105 NaN NaN
6 106 93.0 False
我可以像下面这样解决这个问题:
In [725]: z = m1.merge(m3, on='Person_id')
In [728]: z['Evidence_14'] = z.Evidence_14_x.combine_first(z.Evidence_14_y)
In [731]: z['Feature_14'] = z.Feature_14_x.combine_first(z.Feature_14_y)
In [733]: z.drop(['Evidence_14_x', 'Evidence_14_y', 'Feature_14_x', 'Feature_14_y'], 1, inplace=True)
In [734]: z
Out[734]:
Person_id Evidence_14 Feature_14
0 100 90.0 True
1 101 99.0 False
2 102 91.0 True
3 103 95.0 False
4 104 94.0 True
5 105 NaN NaN
6 106 93.0 False
但是,有没有更清洁/更好的方法来做到这一点?我是否遗漏了一些非常明显的东西?
But, is there a cleaner/better way to do this? Am I missing something very obvious?
推荐答案
如果列名称匹配并且需要按 Person_id
值匹配,请使用:
If columns names matching and need match by Person_id
values use:
m = m1.set_index('Person_id').combine_first(m2.set_index('Person_id')).reset_index()
如果两个DataFrames中的索引值相同并且Person_id
相同,则应通过与原始索引值匹配来简化解决方案:
If index values are same and also Person_id
are same in both DataFrames solution should be simplify by matching with original index values:
m = m1.combine_first(m2)
这篇关于Pandas:基于公共列连接两个数据框的最佳方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文