将重复的 pandas 替换为to_sql(sqlite) [英] Replacing duplicates pandas to_sql (sqlite)
问题描述
我正在将pandas数据帧附加到sqlite.我的主键是:
I am appending pandas dataframes to sqlite. My primary key is:
Datetime | UserID | CustomerID
我的问题是,有时我会得到一个包含旧数据的新文件,我想将其附加到现有的sqlite表中.我没有将该表读到内存中,所以不能在pandas中drop_duplicates. (例如,一个文件始终是本月至今的数据,并且每天都会发送给我)
My issue is that sometimes I get a new file with old data that I want to append to the existing sqlite table. I am not reading that table into memory so I can't drop_duplicates in pandas. (For example, one file is always month-to-date data and it is sent to me everyday)
如何确保仅基于主键附加唯一值?追加新数据时,是否存在要插入或替换的pandas to_sql函数?
How can I ensure that I am only appending unique values based on my primary key? Is there a pandas to_sql function to insert or replace when I append the new data?
此外,在写入SQL之前,我应该在熊猫中指定dtypes吗?当我尝试写入SQLite时,我收到一些错误消息,并且有分类dtype.
Also, should I specify dtypes in pandas before writing to SQL? I had some error messages when I tried to write to SQLite and I had categorical dtypes.
推荐答案
如果尝试插入重复的数据,则会出现sqlite3.IntegrityError
异常.您可以捕获该消息而无所事事,例如:
If you attempt to insert duplicate data you'll get a sqlite3.IntegrityError
exception. You can catch that and do nothing, for example:
try:
df.to_sql('t',conn,flavor='sqlite',if_exists='append',index=False,
index_label='user_id')
except sqlite3.IntegrityError:
pass
这篇关于将重复的 pandas 替换为to_sql(sqlite)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!