pymssql executemany 插入值很慢 [英] pymssql executemany insert value very slow

查看:35
本文介绍了pymssql executemany 插入值很慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

python-2.7.15、pymssql-2.1.4、SQL_Server-2018、Windows 10 Pro、MS-Office-2016

python-2.7.15, pymssql-2.1.4, SQL_Server-2018, Windows 10 Pro, MS-Office-2016

import time
import csv
import pymssql

db_settings = {
    "host" : "127.0.0.1",
    "port" : "1433",
    "user" : "sa",
    "password" : "********",
    "database" : "testdb",
    "charset" : "utf8"
               }
conn = pymssql.connect(**db_settings)
cursor = conn.cursor()
ff = csv.reader(open('base.csv', 'r'))
sql = """
    BEGIN
        INSERT INTO Base([name], [year], [update], [status], 
    [timeline], [language], [pic]) VALUES (%s, %s, %s, %s, %s, %s, %s)
    END
    """
now=time.strftime("%M:%S")
t = []
for i in ff:
    i = i[1:]
    if "year" in i:
        pass
    else:
        t.append((i[0], i[1], i[3], i[4], i[6], i[5], i[8]))
cursor.executemany(sql, t)
conn.commit()

end=time.strftime("%M:%S")

print(now+","+end)

base.csv"文件大小为 21.7 MB 和 30374 行.当我执行上述代码时,需要 929 秒才能完成.这意味着只有 32.7 行/秒,太慢了.谁能帮我找出原因?非常感谢.:-)

The file of "base.csv" size is 21.7 MB and 30374 rows. When I execute the above code, It will take 929 seconds to completed. This is meaning only 32.7 rows/second, it too slow. Who can to help me find out the reason?Thank a lot. :-)

推荐答案

我像这样将 pymssql 中 execute_many 的时间从 30 分钟减少到 30 秒.

I reduced time of execute_many in pymssql from 30min to 30s like this.

在 sql 中,您可以一次创建多行的插入语句.如下图

In sql you can create insert statements with multiple rows at once. It looks like below

INSERT (col_name1, col_name2) 
INTO table_name 
VALUES 
(row1_val1, row1_val2), 
(row2_val1, row2_val2) ... 
(row1000_val1, row1000_val2)

我实现了插入函数,该函数获取数据块并修改查询以通过一次执行插入多个值.

I implemented insert function which gets chunks of data and modifies the query to insert multiple values with one execute.

def insert(query, data, chunk=999):
    conn = get_connection()
    cursor = conn.cursor()
    query = query.lower()
    insert_q, values_q = query.split('values') # get part with the query and the parameters
    insert_q += 'values' # add values to make sql query correct after split
    for chunk_data in chunks(data, chunk):
        # chunk_data contains list of row parameters
        flat_list = [item for sublist in chunk_data for item in sublist] # we make it flat to use execute later instead execute_many
        chunk_query = insert_q + ','.join([values_q] * len(chunk_data)) # creating the query with multiple values insert
        cursor.execute(chunk_query, tuple(flat_list)
        conn.commit()

chunks 可以这样实现(感谢本论坛的一个很棒的回复)

chunks can be implemented like this (thanks to on of the great reply from this forum)

def chunks(lst, n):
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

示例用法

insert('INSERT (user_id, name, surname) INTO users VALUES (%s, %s, %s)',
       [(1, 'Jack', 'Kcaj'), (2, 'Andrew', 'Golara')]

这篇关于pymssql executemany 插入值很慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆