在 Pandas 中合并两个数据集 [英] Merge two datasets in Pandas

查看:38
本文介绍了在 Pandas 中合并两个数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我之前曾与 Stata 合作过,现在正在尝试使用 Python 完成同样的工作.但是,我在使用合并命令时遇到了麻烦.不知何故,我一定错过了一些东西.我要合并的两个数据框如下所示:

I have previously worked with Stata and am now trying to get the same done with Python. However, I have troubles with the merge command. Somehow I must be missing something. My two dataframes I want to merge look like this:

 df1:
 Date id Market_Cap
 2000 1  400
 2000 2  200
 2001 1  410
 2001 2  220

 df2:
 id Ticker
 1   Shell
 2   ExxonMobil

我现在的目标是获得以下数据集:

My aim now is to get the following dataset:

Date id Market_Cap  Ticker
2000 1  400        Shell 
2000 2  200        ExxonMobil 
2001 1  410        Shell 
2001 2  220        ExxonMobil

我尝试了以下命令:

merged= pd.merge(df1, df2, how="left", on="id")

这合并了数据集,但在 Ticker 列中只给了我 nan.我查看了几个来源,也许我弄错了,但是左"命令不是为了我的目的而做的正确的事情吗?我也试过右"和外".他们没有得到我想要的结果,内在"似乎在这里不起作用.

This merges the datasets, but gives me only nan's in the Ticker column. I looked at several sources and maybe I am mistaken, but isn't the "left" command the right thing do to for my purpose? I also tried "right" and "outer". They don't get the result I want to and "inner" does not seem to work here in general.

我是否遗漏了一些重要的东西?

Am I missing something crucial?

推荐答案

你的列id在一个df中的问题是object(显然是stringcode>) 和另一个 int,所以没有匹配并得到 NaN.

Thyere is problem your column id in one df is object (obviously string) and another int, so no match and get NaN.

如果有相同的 dtypes:

If have same dtypes:

print (df1['id'].dtypes)
int64
print (df2['id'].dtypes)
int64

merged = pd.merge(df1, df2, how="left", on="id")
print (merged)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

如果只需要添加一个新列,另一种解决方案是 地图:

Another solution if need add only one new column is map:

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

模拟您的问题:

print (df1['id'].dtypes)
object
print (df2['id'].dtypes)
int64

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date id  Market_Cap Ticker
0  2000  1         400    NaN
1  2000  2         200    NaN
2  2001  1         410    NaN
3  2001  2         220    NaN

解决方案是通过 int"noreferrer">astype(或 df2str 中的 id 列):

And solution is convert to int by astype (or column id in df2 to str):

df1['id'] = df1['id'].astype(int)
#alternatively
#df2['id'] = df2['id'].astype(str)
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

这篇关于在 Pandas 中合并两个数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆