pandas 合并两个不带列的数据框 [英] Pandas Merge two DataFrames without some columns
问题描述
我正在尝试将两个大CSV文件合并在一起.
I'm trying to merge two big CSV files together.
假设我有一个像下面这样的Pandas DataFrame ...
Let's say I've one Pandas DataFrame like the following...
EntityNum foo ...
------------------------
1001.01 100
1002.02 50
1003.03 200
还有一个这样的人...
And another one like this...
EntityNum a_col b_col
-----------------------------------
1001.01 alice 7
1002.02 bob 8
1003.03 777 9
我想这样加入他们:
EntityNum foo a_col
----------------------------
1001.01 100 alice
1002.02 50 bob
1003.03 200 777
所以请记住,我不要在最终结果中使用b_col.我如何用熊猫来做到这一点?
So Keep in mind, I don't want b_col in the final result. How do I I accomplish this with Pandas?
使用SQL,我可能应该做类似的事情:
Using SQL, I should probably have done something like:
SELECT t1.*, t2.a_col FROM table_1 as t1
LEFT JOIN table_2 as t2
ON t1.EntityNum = t2.EntityNum;
搜索
我知道可以使用合并.这是我尝试过的:
Search
I know it is possible to use merge. This is what I've tried:
import pandas as pd
df_a = pd.read_csv(path_a, sep=',')
df_b = pd.read_csv(path_b, sep=',')
df_c = pd.merge(df_a, df_b, on='EntityNumber')
但是在避免最终数据帧中的某些不需要的列方面,我陷入了困境.
But I'm stuck when it comes to avoiding some of the unwanted columns in the final dataframe.
推荐答案
您可以先通过它们的标签访问相关的数据框列(例如df_a[['EntityNum', 'foo']]
,然后将它们联接起来.
You can first access the relevant dataframe columns via their labels (e.g. df_a[['EntityNum', 'foo']]
and then join those.
df_a[['EntityNum', 'foo']].merge(df_b[['EntityNum', 'a_col']], on='EntityNum', how='left')
请注意,merge
的默认行为是进行内部联接.
Note that the default behavior for merge
is to do an inner join.
这篇关于 pandas 合并两个不带列的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!