使用 Python 将大量数据批量插入 SQLite [英] Bulk insert huge data into SQLite using Python

查看:553
本文介绍了使用 Python 将大量数据批量插入 SQLite的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我读到:导入 CSV使用 Python 将文件写入 sqlite3 数据库表

而且似乎每个人都建议使用逐行读取而不是使用来自 SQLite 的批量 .import.但是,如果您有数百万行数据,这将使插入变得非常缓慢.有没有其他方法可以绕过这个?

and it seems that everyone suggests using line-by-line reading instead of using bulk .import from SQLite. However, that will make the insertion really slow if you have millions of rows of data. Is there any other way to circumvent this?

更新:我尝试了以下代码逐行插入,但速度没有我预期的那么好.有没有什么可以改进的

Update: I tried the following code to insert line by line but the speed is not as good as I expected. Is there anyway to improve it

for logFileName in allLogFilesName:
    logFile = codecs.open(logFileName, 'rb', encoding='utf-8')
    for logLine in logFile:
        logLineAsList = logLine.split('\t')
        output.execute('''INSERT INTO log VALUES(?, ?, ?, ?)''', logLineAsList)
    logFile.close()
connection.commit()
connection.close()

推荐答案

使用生成器表达式动态地将数据分成块,在事务中插入.这是 sqlite 的引用优化常见问题:

Divide your data into chunks on the fly using generator expressions, make inserts inside the transaction. Here's a quote from sqlite optimization FAQ:

除非已经在一个事务中,否则每个 SQL 语句都有一个新的交易开始了.这是非常昂贵的,因为它需要重新打开、写入和关闭每个日志文件陈述.这可以通过包装 SQL 语句序列来避免开始交易;并结束交易;声明.这个加速也为不改变数据库的语句获取.

Unless already in a transaction, each SQL statement has a new transaction started for it. This is very expensive, since it requires reopening, writing to, and closing the journal file for each statement. This can be avoided by wrapping sequences of SQL statements with BEGIN TRANSACTION; and END TRANSACTION; statements. This speedup is also obtained for statements which don't alter the database.

这是您的代码的样子.

此外,sqlite 能够导入 CSV 文件.

Also, sqlite has an ability to import CSV files.

这篇关于使用 Python 将大量数据批量插入 SQLite的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆