在 Pandas 中合并两个数据集 [英] Merge two datasets in Pandas
问题描述
我之前曾与 Stata 合作过,现在正在尝试使用 Python 完成同样的工作.但是,我在使用合并命令时遇到了麻烦.不知何故,我一定错过了一些东西.我要合并的两个数据框如下所示:
I have previously worked with Stata and am now trying to get the same done with Python. However, I have troubles with the merge command. Somehow I must be missing something. My two dataframes I want to merge look like this:
df1:
Date id Market_Cap
2000 1 400
2000 2 200
2001 1 410
2001 2 220
df2:
id Ticker
1 Shell
2 ExxonMobil
我现在的目标是获得以下数据集:
My aim now is to get the following dataset:
Date id Market_Cap Ticker
2000 1 400 Shell
2000 2 200 ExxonMobil
2001 1 410 Shell
2001 2 220 ExxonMobil
我尝试了以下命令:
merged= pd.merge(df1, df2, how="left", on="id")
这合并了数据集,但在 Ticker 列中只给了我 nan.我查看了几个来源,也许我弄错了,但是左"命令不是为了我的目的而做的正确的事情吗?我也试过右"和外".他们没有得到我想要的结果,内在"似乎在这里不起作用.
This merges the datasets, but gives me only nan's in the Ticker column. I looked at several sources and maybe I am mistaken, but isn't the "left" command the right thing do to for my purpose? I also tried "right" and "outer". They don't get the result I want to and "inner" does not seem to work here in general.
我是否遗漏了一些重要的东西?
Am I missing something crucial?
推荐答案
你的列id
在一个df中的问题是object
(显然是string
code>) 和另一个 int
,所以没有匹配并得到 NaN
.
Thyere is problem your column id
in one df is object
(obviously string
) and another int
, so no match and get NaN
.
如果有相同的 dtypes
:
If have same dtypes
:
print (df1['id'].dtypes)
int64
print (df2['id'].dtypes)
int64
merged = pd.merge(df1, df2, how="left", on="id")
print (merged)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
如果只需要添加一个新列,另一种解决方案是 地图
:
Another solution if need add only one new column is map
:
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
模拟您的问题:
print (df1['id'].dtypes)
object
print (df2['id'].dtypes)
int64
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 NaN
1 2000 2 200 NaN
2 2001 1 410 NaN
3 2001 2 220 NaN
解决方案是通过 int"noreferrer">astype
(或 df2
到 str
中的 id
列):
And solution is convert to int
by astype
(or column id
in df2
to str
):
df1['id'] = df1['id'].astype(int)
#alternatively
#df2['id'] = df2['id'].astype(str)
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
这篇关于在 Pandas 中合并两个数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!