pandas 合并两个不带列的数据框 [英] Pandas Merge two DataFrames without some columns

查看:53
本文介绍了 pandas 合并两个不带列的数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将两个大CSV文件合并在一起.

I'm trying to merge two big CSV files together.

假设我有一个像下面这样的Pandas DataFrame ...

Let's say I've one Pandas DataFrame like the following...

EntityNum    foo   ...
------------------------
1001.01      100
1002.02       50
1003.03      200

还有一个这样的人...

And another one like this...

EntityNum    a_col    b_col
-----------------------------------
1001.01      alice        7  
1002.02        bob        8
1003.03        777        9

我想这样加入他们:

EntityNum    foo    a_col
----------------------------
1001.01      100    alice
1002.02       50      bob
1003.03      200      777

所以请记住,我不要在最终结果中使用b_col.我如何用熊猫来做到这一点?

So Keep in mind, I don't want b_col in the final result. How do I I accomplish this with Pandas?

使用SQL,我可能应该做类似的事情:

Using SQL, I should probably have done something like:

SELECT t1.*, t2.a_col FROM table_1 as t1
                      LEFT JOIN table_2 as t2
                      ON t1.EntityNum = t2.EntityNum; 

搜索

我知道可以使用合并.这是我尝试过的:

Search

I know it is possible to use merge. This is what I've tried:

import pandas as pd

df_a = pd.read_csv(path_a, sep=',')
df_b = pd.read_csv(path_b, sep=',')
df_c = pd.merge(df_a, df_b, on='EntityNumber')

但是在避免最终数据帧中的某些不需要的列方面,我陷入了困境.

But I'm stuck when it comes to avoiding some of the unwanted columns in the final dataframe.

推荐答案

您可以先通过它们的标签访问相关的数据框列(例如df_a[['EntityNum', 'foo']],然后将它们联接起来.

You can first access the relevant dataframe columns via their labels (e.g. df_a[['EntityNum', 'foo']] and then join those.

df_a[['EntityNum', 'foo']].merge(df_b[['EntityNum', 'a_col']], on='EntityNum', how='left')

请注意,merge的默认行为是进行内部联接.

Note that the default behavior for merge is to do an inner join.

这篇关于 pandas 合并两个不带列的数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆