如何使用pyodbc加快从CSV到MS SQL Server的批量插入 [英] How to speed up bulk insert to MS SQL Server from CSV using pyodbc

查看：105 发布时间：2020/9/24 5:05:59 python sql-server sql-server-2012 bulkinsert pyodbc

本文介绍了如何使用pyodbc加快从CSV到MS SQL Server的批量插入的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

下面是我需要帮助的代码。
我必须将其运行1,300,000行，这意味着最多需要 40分钟才能插入〜300,000行。

Below is my code that I'd like some help with. I am having to run it over 1,300,000 rows meaning it takes up to 40 minutes to insert ~300,000 rows.

我认为批量插入是否可以加快速度？
还是因为我要通过遍历行以读取读取器中的数据：部分？

I figure bulk insert is the route to go to speed it up? Or is it because I'm iterating over the rows via for data in reader: portion?

#Opens the prepped csv file
with open (os.path.join(newpath,outfile), 'r') as f:
    #hooks csv reader to file
    reader = csv.reader(f)
    #pulls out the columns (which match the SQL table)
    columns = next(reader)
    #trims any extra spaces
    columns = [x.strip(' ') for x in columns]
    #starts SQL statement
    query = 'bulk insert into SpikeData123({0}) values ({1})'
    #puts column names in SQL query 'query'
    query = query.format(','.join(columns), ','.join('?' * len(columns)))

    print 'Query is: %s' % query
    #starts curser from cnxn (which works)
    cursor = cnxn.cursor()
    #uploads everything by row
    for data in reader:
        cursor.execute(query, data)
        cursor.commit()

我正在动态选择我的c故意使用olumn标头（因为我想创建尽可能多的pythonic代码）。

I am dynamically picking my column headers on purpose (as I would like to create the most pythonic code possible).

SpikeData123是表名。

SpikeData123 is the table name.

推荐答案

更新：如@SimonLang的注释中所述，SQL Server 2017及更高版本中的 BULK INSERT 显然支持文本CSV文件中的限定词（参考：此处）。

Update: As noted in the comment from @SimonLang, BULK INSERT under SQL Server 2017 and later apparently does support text qualifiers in CSV files (ref: here).

批量插入几乎可以肯定比阅读源代码要快很多。文件逐行，并对每行进行常规INSERT。但是，BULK INSERT和BCP都对CSV文件有很大的限制，因为它们不能处理文本限定符（请参阅：此处）。也就是说，如果您的CSV文件中 not 中没有合格的文本字符串...

BULK INSERT will almost certainly be much faster than reading the source file row-by-row and doing a regular INSERT for each row. However, both BULK INSERT and BCP have a significant limitation regarding CSV files in that they cannot handle text qualifiers (ref: here). That is, if your CSV file does not have qualified text strings in it ...

1,Gord Thompson,2015-04-15
2,Bob Loblaw,2015-04-07

...然后可以批量插入它，但是如果它包含文本限定符（因为某些文本值包含逗号）...

... then you can BULK INSERT it, but if it contains text qualifiers (because some text values contains commas) ...

1,"Thompson, Gord",2015-04-15
2,"Loblaw, Bob",2015-04-07

...然后BULK INSERT无法处理它。尽管如此，将这样的CSV文件预处理为管道分隔文件的总体速度可能会更快...

... then BULK INSERT cannot handle it. Still, it might be faster overall to pre-process such a CSV file into a pipe-delimited file ...

1|Thompson, Gord|2015-04-15
2|Loblaw, Bob|2015-04-07

...或制表符分隔的文件（其中→表示制表符）...

... or a tab-delimited file (where → represents the tab character) ...

1→Thompson, Gord→2015-04-15
2→Loblaw, Bob→2015-04-07

...，然后批量插入该文件。对于后一个（制表符分隔的）文件，BULK INSERT代码如下所示：

... and then BULK INSERT that file. For the latter (tab-delimited) file the BULK INSERT code would look something like this:

import pypyodbc
conn_str = "DSN=myDb_SQLEXPRESS;"
cnxn = pypyodbc.connect(conn_str)
crsr = cnxn.cursor()
sql = """
BULK INSERT myDb.dbo.SpikeData123
FROM 'C:\\__tmp\\biTest.txt' WITH (
    FIELDTERMINATOR='\\t',
    ROWTERMINATOR='\\n'
    );
"""
crsr.execute(sql)
cnxn.commit()
crsr.close()
cnxn.close()

注意：如评论中所述，仅执行 BULK INSERT 语句如果SQL Server实例可以直接读取源文件，则适用。对于源文件在远程客户端上的情况，请参见此答案。

Note: As mentioned in a comment, executing a BULK INSERT statement is only applicable if the SQL Server instance can directly read the source file. For cases where the source file is on a remote client, see this answer.

这篇关于如何使用pyodbc加快从CSV到MS SQL Server的批量插入的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用pyodbc加快从CSV到MS SQL Server的批量插入 [英] How to speed up bulk insert to MS SQL Server from CSV using pyodbc

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

如何使用pyodbc加快从CSV到MS SQL Server的批量插入 [英] How to speed up bulk insert to MS SQL Server from CSV using pyodbc

问题描述

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭