在公共列上连接两个数据框 [英] join two dataframes on common column
问题描述
我想加入两个数据源,订单和客户:
I want to join two data sources, orders and customers:
orders是一个SQL Server表:
orders is an SQL Server table:
orderid| customerid | orderdate | ordercost
------ | -----------| --------- | --------
12000 | 1500 |2008-08-09 | 38610
而客户是一个csv文件:
and customers is a csv file:
customerid,first_name,last_name,starting_date,ending_date,country
1500,Sian,Read,2008-01-07,2010-01-07,Greenland
我想在我的Python应用程序中连接这两个表,所以我编写了以下代码:
I want to join these two tables in my Python application, so I wrote the following code:
# Connect to SQL Sever with Pyodbc library
connection = pypyodbc.connect("connection string here")
cursor=connection.cursor();
cursor.execute("SELECT * from order)
result= cursor.fetchall()
# convert the result to pandas Dataframe
df1 = pd.DataFrame(result, columns= ['orderid','customerid','orderdate','ordercost'])
# Read CSV File
df2=pd.read_csv(customer_csv)
# Merge two dataframes
merged= pd.merge( df1, df2, on= 'customerid', how='inner')
print(merged[['first_name', 'country']])
我希望
first_name | country
-----------|--------
Sian | Greenland
但是我得到的结果是空的.
But I get empty result.
当我对来自CSV文件的两个数据帧执行此代码时,它可以正常工作.有帮助吗?
When I perform this code for two data frames that are both from CSV files, it works fine. Any help?
谢谢.
推荐答案
我认为问题是列customerid
在两个DataFrames
中都具有不同的dtypes
,因此没有匹配项.
I think problem is columns customerid
has different dtypes
in both DataFrames
so no match.
因此需要将两列都转换为int
或都转换为str
.
So need convert both columns to int
or both to str
.
df1['customerid'] = df1['customerid'].astype(int)
df2['customerid'] = df2['customerid'].astype(int)
或者:
df1['customerid'] = df1['customerid'].astype(str)
df2['customerid'] = df2['customerid'].astype(str)
也可以省略how='inner'
,因为默认值 merge
:
Also is possible omit how='inner'
, because default value of merge
:
merged= pd.merge( df1, df2, on= 'customerid')
这篇关于在公共列上连接两个数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!