pandas to_sql()更新数据库中的唯一值? [英] Pandas to_sql() to update unique values in DB?
问题描述
如何使用df.to_sql(if_exists = 'append')
在数据框和数据库之间仅附加唯一值.换句话说,我想评估DF和DB之间的重复项,并在写入数据库之前删除这些重复项.
How can I use the df.to_sql(if_exists = 'append')
to append ONLY the unique values between the dataframe and the database. In other words, I would like to evaluate the duplicates between the DF and the DB and drop those duplicates before writing to the database.
是否有用于此的参数?
我知道参数if_exists = 'append'
和if_exists = 'replace'
用于整个表-而不是唯一条目.
I understand that the parameters if_exists = 'append'
and if_exists = 'replace'
is for the entire table - not the unique entries.
I am using:
sqlalchemy
pandas dataframe with the following datatypes:
index: datetime.datetime <-- Primary Key
float
float
float
float
integer
string <--- Primary Key
string<---- Primary Key
我对此一无所知,非常感谢您的帮助. -谢谢
I'm stuck on this so your help is much appreciated. -Thanks
推荐答案
在熊猫中,to_sql
中没有方便的参数,只能将非重复项附加到最终表中.考虑使用熊猫总是总是替换的临时临时表,然后运行最终追加查询以将临时表记录迁移到最终表,仅考虑使用NOT EXISTS
子句的唯一PK.
In pandas, there is no convenient argument in to_sql
to append only non-duplicates to a final table. Consider using a staging temp table that pandas always replaces and then run a final append query to migrate temp table records to final table accounting only for unique PK's using the NOT EXISTS
clause.
engine = sqlalchemy.create_engine(...)
df.to_sql(name='myTempTable', con=engine, if_exists='replace')
with engine.begin() as cn:
sql = """INSERT INTO myFinalTable (Col1, Col2, Col3, ...)
SELECT t.Col1, t.Col2, t.Col3, ...
FROM myTempTable t
WHERE NOT EXISTS
(SELECT 1 FROM myFinalTable f
WHERE t.MatchColumn1 = f.MatchColumn1
AND t.MatchColumn2 = f.MatchColumn2)"""
cn.execute(sql)
这将是ANSI SQL解决方案,并且不限于特定于供应商的方法,例如UPSERT
,因此实际上与所有SQL集成的关系数据库兼容.
This would be an ANSI SQL solution and not restricted to vendor-specific methods like UPSERT
and so is compliant in practically all SQL-integrated relational databases.
这篇关于 pandas to_sql()更新数据库中的唯一值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!