合并 pandas 中的两个数据集 [英] Merge two datasets in Pandas
问题描述
我以前曾与Stata合作,现在正尝试使用Python完成相同的工作.但是,我在使用merge命令时遇到了麻烦.我一定不知所措.我要合并的两个数据框如下所示:
I have previously worked with Stata and am now trying to get the same done with Python. However, I have troubles with the merge command. Somehow I must be missing something. My two dataframes I want to merge look like this:
df1:
Date id Market_Cap
2000 1 400
2000 2 200
2001 1 410
2001 2 220
df2:
id Ticker
1 Shell
2 ExxonMobil
我现在的目标是获取以下数据集:
My aim now is to get the following dataset:
Date id Market_Cap Ticker
2000 1 400 Shell
2000 2 200 ExxonMobil
2001 1 410 Shell
2001 2 220 ExxonMobil
我尝试了以下命令:
merged= pd.merge(df1, df2, how="left", on="id")
这将合并数据集,但在股票行情栏"中仅给出nan的值. 我查看了几个消息来源,也许我弄错了,但是左"命令不是为了我的目的而做的正确的事情吗?我也尝试过正确"和外部".他们没有得到我想要的结果,并且内部"似乎在这里通常不起作用.
This merges the datasets, but gives me only nan's in the Ticker column. I looked at several sources and maybe I am mistaken, but isn't the "left" command the right thing do to for my purpose? I also tried "right" and "outer". They don't get the result I want to and "inner" does not seem to work here in general.
我错过了一些重要的东西吗?
Am I missing something crucial?
推荐答案
问题是,您在一个df中的列id
是object
(显然是string
)而另一个列int
是问题,因此没有匹配项并得到NaN
.
Thyere is problem your column id
in one df is object
(obviously string
) and another int
, so no match and get NaN
.
如果具有相同的 dtypes
:
If have same dtypes
:
print (df1['id'].dtypes)
int64
print (df2['id'].dtypes)
int64
merged = pd.merge(df1, df2, how="left", on="id")
print (merged)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
如果仅需要添加一个新列的另一种解决方案是 map
:
Another solution if need add only one new column is map
:
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
模拟您的问题:
print (df1['id'].dtypes)
object
print (df2['id'].dtypes)
int64
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 NaN
1 2000 2 200 NaN
2 2001 1 410 NaN
3 2001 2 220 NaN
然后 astype
(或df2
至str
中的id
列):
df1['id'] = df1['id'].astype(int)
#alternatively
#df2['id'] = df2['id'].astype(str)
df1['Ticker'] = df1['id'].map(df2.set_index('id')['Ticker'])
print (df1)
Date id Market_Cap Ticker
0 2000 1 400 Shell
1 2000 2 200 ExxonMobil
2 2001 1 410 Shell
3 2001 2 220 ExxonMobil
这篇关于合并 pandas 中的两个数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!