pandas to_sql()更新数据库中的唯一值? [英] Pandas to_sql() to update unique values in DB?

查看:353
本文介绍了 pandas to_sql()更新数据库中的唯一值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用df.to_sql(if_exists = 'append')在数据框和数据库之间仅附加唯一值.换句话说,我想评估DF和DB之间的重复项,并在写入数据库之前删除这些重复项.

How can I use the df.to_sql(if_exists = 'append') to append ONLY the unique values between the dataframe and the database. In other words, I would like to evaluate the duplicates between the DF and the DB and drop those duplicates before writing to the database.

是否有用于此的参数?

我知道参数if_exists = 'append'if_exists = 'replace'用于整个表-而不是唯一条目.

I understand that the parameters if_exists = 'append' and if_exists = 'replace'is for the entire table - not the unique entries.

I am using: 
sqlalchemy

pandas dataframe with the following datatypes: 
    index: datetime.datetime <-- Primary Key
    float
    float
    float
    float
    integer
    string <---  Primary Key
    string<----  Primary Key

我对此一无所知,非常感谢您的帮助. -谢谢

I'm stuck on this so your help is much appreciated. -Thanks

推荐答案

在熊猫中,to_sql中没有方便的参数,只能将非重复项附加到最终表中.考虑使用熊猫总是总是替换的临时临时表,然后运行最终追加查询以将临时表记录迁移到最终表,仅考虑使用NOT EXISTS子句的唯一PK.

In pandas, there is no convenient argument in to_sql to append only non-duplicates to a final table. Consider using a staging temp table that pandas always replaces and then run a final append query to migrate temp table records to final table accounting only for unique PK's using the NOT EXISTS clause.

engine = sqlalchemy.create_engine(...)

df.to_sql(name='myTempTable', con=engine, if_exists='replace')

with engine.begin() as cn:
   sql = """INSERT INTO myFinalTable (Col1, Col2, Col3, ...)
            SELECT t.Col1, t.Col2, t.Col3, ...
            FROM myTempTable t
            WHERE NOT EXISTS 
                (SELECT 1 FROM myFinalTable f
                 WHERE t.MatchColumn1 = f.MatchColumn1
                 AND t.MatchColumn2 = f.MatchColumn2)"""

   cn.execute(sql)

这将是ANSI SQL解决方案,并且不限于特定于供应商的方法,例如UPSERT,因此实际上与所有SQL集成的关系数据库兼容.

This would be an ANSI SQL solution and not restricted to vendor-specific methods like UPSERT and so is compliant in practically all SQL-integrated relational databases.

这篇关于 pandas to_sql()更新数据库中的唯一值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆