pandas to_sql 插入忽略 [英] pandas to_sql insert ignore

查看:102
本文介绍了 pandas to_sql 插入忽略的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想不断地将数据框行添加到 MySQL 数据库中,避免在 MySQL 中出现任何重复的条目.

I want to incrementally keep adding data frame rows into MySQL DB avoiding any duplicate entries to go in MySQL.

我目前通过使用 df.apply() 遍历每一行并调用 MySQL insert ignore(duplicates) 将唯一行添加到 MySQL 数据库中来执行此操作.但是使用 pandas.apply 非常慢(10k 行需要 45 秒).我想使用 pandas.to_sql() 方法实现这一点,该方法需要 0.5 秒才能将 10k 条目推送到数据库中,但不支持在追加模式下忽略重复.有没有一种高效快捷的方法来实现这一目标?

I am currently doing this by looping through every row using df.apply()and calling MySQL insert ignore(duplicates) to add unique rows into MySQL database. But using pandas.apply is very slow(45 secs for 10k rows). I want to achieve this using pandas.to_sql() method which takes 0.5 secs to push 10k entries into DB but doesn't support ignore duplicate in append mode. Is there an efficient and fast way to achieve this?

输入CSV

Date,Open,High,Low,Close,Volume
1994-01-03,111.7,112.75,111.55,112.65,0
1994-01-04,112.68,113.47,112.2,112.65,0
1994-01-05,112.6,113.63,112.3,113.0,0
1994-01-06,113.02,113.43,112.25,112.62,0
1994-01-07,112.55,112.8,111.5,111.88,0
1994-01-10,111.8,112.43,111.35,112.25,0
1994-01-11,112.18,112.88,112.05,112.4,0
1994-01-12,112.38,112.82,111.95,112.28,0

代码

nifty_data.to_sql(name='eod_data', con=engine, if_exists = 'append', index=False) # option-1 
nifty_data.apply(addToDb, axis=1) # option-2 

def addToDb(row):
    sql = "INSERT IGNORE INTO eod_data (date, open, high, low, close, volume) VALUES (%s,%s,%s,%s,%s,%s)"
    val = (row['Date'], row['Open'], row['High'], row['Low'], row['Close'], row['Volume'])
    mycursor.execute(sql, val)
    mydb.commit()`

option-1:不允许插入忽略(~0.5 秒)

option-2:必须循环并且非常慢(约 45 秒)

推荐答案

可以创建临时表:

nifty_data.to_sql(name='temporary_table', con=engine, if_exists = 'append', index=False)

然后运行一个 INSERT IGNORE 语句:

And then run an INSERT IGNORE statement from that:

with engine.begin() as cnx:
    insert_sql = 'INSERT IGNORE INTO eod_data (SELECT * FROM temporary_table)'
    cnx.execute(insert_sql)

只需确保列顺序相同,否则您可能需要手动声明它们.

just make sure the column orders are the same or you might have to manually declare them.

这篇关于 pandas to_sql 插入忽略的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆