合并 pandas 中的两个数据集 [英] Merge two datasets in Pandas

查看:96
本文介绍了合并 pandas 中的两个数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我以前曾与Stata合作,现在正尝试使用Python完成相同的工作.但是,我在使用merge命令时遇到了麻烦.我一定不知所措.我要合并的两个数据框如下所示:

I have previously worked with Stata and am now trying to get the same done with Python. However, I have troubles with the merge command. Somehow I must be missing something. My two dataframes I want to merge look like this:

 df1:
 Date id Market_Cap
 2000 1  400
 2000 2  200
 2001 1  410
 2001 2  220

 df2:
 id Ticker
 1   Shell
 2   ExxonMobil

我现在的目标是获取以下数据集:

My aim now is to get the following dataset:

Date id Market_Cap  Ticker
2000 1  400        Shell 
2000 2  200        ExxonMobil 
2001 1  410        Shell 
2001 2  220        ExxonMobil

我尝试了以下命令:

merged= pd.merge(df1, df2, how="left", on="id")

这将合并数据集,但在股票行情栏"中仅给出nan的值. 我查看了几个消息来源,也许我弄错了,但是左"命令不是为了我的目的而做的正确的事情吗?我也尝试过正确"和外部".他们没有得到我想要的结果,并且内部"似乎在这里通常不起作用.

This merges the datasets, but gives me only nan's in the Ticker column. I looked at several sources and maybe I am mistaken, but isn't the "left" command the right thing do to for my purpose? I also tried "right" and "outer". They don't get the result I want to and "inner" does not seem to work here in general.

我错过了一些重要的东西吗?

Am I missing something crucial?

推荐答案

问题是,您在一个df中的列idobject(显然是string)而另一个列int是问题,因此没有匹配项并得到NaN.

Thyere is problem your column id in one df is object (obviously string) and another int, so no match and get NaN.

如果具有相同的 dtypes :

If have same dtypes:

print (df1['id'].dtypes)
int64
print (df2['id'].dtypes)
int64

merged = pd.merge(df1, df2, how="left", on="id")
print (merged)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

如果仅需要添加一个新列的另一种解决方案是 map :

Another solution if need add only one new column is map:

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

模拟您的问题:

print (df1['id'].dtypes)
object
print (df2['id'].dtypes)
int64

df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date id  Market_Cap Ticker
0  2000  1         400    NaN
1  2000  2         200    NaN
2  2001  1         410    NaN
3  2001  2         220    NaN

然后 astype (或df2str中的id列):

df1['id'] = df1['id'].astype(int)
#alternatively
#df2['id'] = df2['id'].astype(str)
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
   Date  id  Market_Cap      Ticker
0  2000   1         400       Shell
1  2000   2         200  ExxonMobil
2  2001   1         410       Shell
3  2001   2         220  ExxonMobil

这篇关于合并 pandas 中的两个数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆