加速从 pyodbc 插入 SQL Server [英] Speed up inserts into SQL Server from pyodbc

查看:43
本文介绍了加速从 pyodbc 插入 SQL Server的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

python中,我有一个从一个数据库(Redshift通过psycopg2)中选择数据的过程,然后将该数据插入到SQL Server(通过 pyodbc).我选择进行读/写而不是读/平面文件/加载,因为每天的行数约为 100,000.似乎更容易简单地连接和插入.但是 - 插入过程很慢,需要几分钟.

In python, I have a process to select data from one database (Redshift via psycopg2), then insert that data into SQL Server (via pyodbc). I chose to do a read / write rather than a read / flat file / load because the row count is around 100,000 per day. Seemed easier to simply connect and insert. However - the insert process is slow, taking several minutes.

是否有更好的方法使用 Pyodbc 将数据插入 SQL Server?

Is there a better way to insert data into SQL Server with Pyodbc?

select_cursor.execute(output_query)

done = False
rowcount = 0

while not done:
    rows = select_cursor.fetchmany(10000)

    insert_list = []

    if rows == []:
        done = True
        break

    for row in rows:
        rowcount += 1

        insert_params = (
            row[0], 
            row[1], 
            row[2]
            )

        insert_list.append(insert_params)            

    insert_cnxn = pyodbc.connect('''Connection Information''')

    insert_cursor = insert_cnxn.cursor()

    insert_cursor.executemany("""
        INSERT INTO Destination (AccountNumber, OrderDate, Value)
        VALUES (?, ?, ?)
        """, insert_list)

    insert_cursor.commit()
    insert_cursor.close()
    insert_cnxn.close()

select_cursor.close()
select_cnxn.close()

推荐答案

UPDATE: pyodbc 4.0.19 添加了一个 Cursor#fast_executemany 选项,可以通过避免下面描述的行为.有关详细信息,请参阅此答案.

UPDATE: pyodbc 4.0.19 added a Cursor#fast_executemany option that can greatly improve performance by avoiding the behaviour described below. See this answer for details.

您的代码确实遵循正确的形式(除了其他答案中提到的一些小调整),但请注意,当 pyodbc 执行 .executemany 时,它实际所做的是提交一个单独的 sp_prepexec 用于每个单独的行.也就是说,对于代码

Your code does follow proper form (aside from the few minor tweaks mentioned in the other answer), but be aware that when pyodbc performs an .executemany what it actually does is submit a separate sp_prepexec for each individual row. That is, for the code

sql = "INSERT INTO #Temp (id, txtcol) VALUES (?, ?)"
params = [(1, 'foo'), (2, 'bar'), (3, 'baz')]
crsr.executemany(sql, params)

SQL Server 实际上执行以下操作(经 SQL Profiler 确认)

the SQL Server actually performs the following (as confirmed by SQL Profiler)

exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',1,N'foo'
exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',2,N'bar'
exec sp_prepexec @p1 output,N'@P1 bigint,@P2 nvarchar(3)',N'INSERT INTO #Temp (id, txtcol) VALUES (@P1, @P2)',3,N'baz'

因此,对于 .executemany 10,000 行的批处理",您将是

So, for an .executemany "batch" of 10,000 rows you would be

  • 执行 10,000 次单独插入,
  • 到服务器的 10,000 次往返,以及
  • 发送相同的 SQL 命令文本 (INSERT INTO ...) 10,000 次.
  • performing 10,000 individual inserts,
  • with 10,000 round-trips to the server, and
  • sending the identical SQL command text (INSERT INTO ...) 10,000 times.

可能让pyodbc发送一个初始的sp_prepare,然后调用sp_execute.executemany,但是 .executemany 的本质是你仍然会执行 10,000 个 sp_prepexec 调用,只是执行 sp_execute 而不是 INSERT INTO ....如果 SQL 语句很长且很复杂,这可以提高性能,但对于像您问题中的示例这样的简短语句,它可能不会产生太大的不同.

It is possible to have pyodbc send an initial sp_prepare and then do an .executemany calling sp_execute, but the nature of .executemany is that you still would do 10,000 sp_prepexec calls, just executing sp_execute instead of INSERT INTO .... That could improve performance if the SQL statement was quite long and complex, but for a short one like the example in your question it probably wouldn't make all that much difference.

还可以发挥创意并构建表值构造函数",如这个答案所示,但请注意,它是仅当原生批量插入机制不是可行的解决方案时才作为B 计划"提供.

One could also get creative and build "table value constructors" as illustrated in this answer, but notice that it is only offered as a "Plan B" when native bulk insert mechanisms are not a feasible solution.

这篇关于加速从 pyodbc 插入 SQL Server的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆