提高SQLite每秒INSERT的性能 [英] Improve INSERT-per-second performance of SQLite

查看：91 发布时间：2020/5/21 20:20:23 c performance sqlite optimization

本文介绍了提高SQLite每秒INSERT的性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

优化SQLite非常棘手. C应用程序的大容量插入性能可以从每秒85次插入到每秒96,000次以上插入！

Optimizing SQLite is tricky. Bulk-insert performance of a C application can vary from 85 inserts per second to over 96,000 inserts per second!

背景:我们正在将SQLite用作桌面应用程序的一部分.我们将大量配置数据存储在XML文件中，这些文件会被解析并加载到SQLite数据库中，以便在初始化应用程序时进行进一步处理. SQLite非常适合这种情况，因为它速度快，不需要专门的配置，并且数据库作为单个文件存储在磁盘上.

Background: We are using SQLite as part of a desktop application. We have large amounts of configuration data stored in XML files that are parsed and loaded into an SQLite database for further processing when the application is initialized. SQLite is ideal for this situation because it's fast, it requires no specialized configuration, and the database is stored on disk as a single file.

合理性: 最初，我对看到的性能感到失望.事实证明，SQLite的性能可能有很大差异(批量插入和选择均如此) )，具体取决于数据库的配置方式以及您使用API的方式.弄清楚所有选项和技术是什么都不是一件容易的事，因此，我认为创建此社区Wiki条目与Stack Overflow读者共享结果以节省其他人的麻烦是审慎的做法.

Rationale: Initially I was disappointed with the performance I was seeing. It turns-out that the performance of SQLite can vary significantly (both for bulk-inserts and selects) depending on how the database is configured and how you're using the API. It was not a trivial matter to figure out what all of the options and techniques were, so I thought it prudent to create this community wiki entry to share the results with Stack Overflow readers in order to save others the trouble of the same investigations.

实验:我认为最好编写一些C代码，而不是简单地讨论一般意义上的性能提示(即使用事务！" ).并实际衡量各种选择的影响.我们将从一些简单的数据开始:

The Experiment: Rather than simply talking about performance tips in the general sense (i.e. "Use a transaction!"), I thought it best to write some C code and actually measure the impact of various options. We're going to start with some simple data:

完整的运输时间表的28 MB TAB分隔的文本文件(约865,000条记录)多伦多市
我的测试计算机是运行Windows XP的3.60 GHz P4.
使用 Visual C ++ 2005将代码编译为发行"，并带有完整优化"(/Ox)和最喜欢的快速代码(/Ot).
我正在使用直接编译到测试应用程序中的SQLite合并".我碰巧拥有的SQLite版本(3.6.7)有点旧，但是我怀疑这些结果将与最新版本相当(如果没有其他意见，请发表评论).

A 28 MB TAB-delimited text file (approximately 865,000 records) of the complete transit schedule for the city of Toronto
My test machine is a 3.60 GHz P4 running Windows XP.
The code is compiled with Visual C++ 2005 as "Release" with "Full Optimization" (/Ox) and Favor Fast Code (/Ot).
I'm using the SQLite "Amalgamation", compiled directly into my test application. The SQLite version I happen to have is a bit older (3.6.7), but I suspect these results will be comparable to the latest release (please leave a comment if you think otherwise).

让我们写一些代码！

代码:一个简单的C程序，它逐行读取文本文件，将字符串拆分为值，然后将数据插入SQLite数据库.在此基准"版本的代码中，创建了数据库，但实际上不会插入数据:

The Code: A simple C program that reads the text file line-by-line, splits the string into values and then inserts the data into an SQLite database. In this "baseline" version of the code, the database is created, but we won't actually insert data:

/*************************************************************
    Baseline code to experiment with SQLite performance.

    Input data is a 28 MB TAB-delimited text file of the
    complete Toronto Transit System schedule/route info
    from http://www.toronto.ca/open/datasets/ttc-routes/

**************************************************************/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <string.h>
#include "sqlite3.h"

#define INPUTDATA "C:\\TTC_schedule_scheduleitem_10-27-2009.txt"
#define DATABASE "c:\\TTC_schedule_scheduleitem_10-27-2009.sqlite"
#define TABLE "CREATE TABLE IF NOT EXISTS TTC (id INTEGER PRIMARY KEY, Route_ID TEXT, Branch_Code TEXT, Version INTEGER, Stop INTEGER, Vehicle_Index INTEGER, Day Integer, Time TEXT)"
#define BUFFER_SIZE 256

int main(int argc, char **argv) {

    sqlite3 * db;
    sqlite3_stmt * stmt;
    char * sErrMsg = 0;
    char * tail = 0;
    int nRetCode;
    int n = 0;

    clock_t cStartClock;

    FILE * pFile;
    char sInputBuf [BUFFER_SIZE] = "\0";

    char * sRT = 0;  /* Route */
    char * sBR = 0;  /* Branch */
    char * sVR = 0;  /* Version */
    char * sST = 0;  /* Stop Number */
    char * sVI = 0;  /* Vehicle */
    char * sDT = 0;  /* Date */
    char * sTM = 0;  /* Time */

    char sSQL [BUFFER_SIZE] = "\0";

    /*********************************************/
    /* Open the Database and create the Schema */
    sqlite3_open(DATABASE, &db);
    sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg);

    /*********************************************/
    /* Open input file and import into Database*/
    cStartClock = clock();

    pFile = fopen (INPUTDATA,"r");
    while (!feof(pFile)) {

        fgets (sInputBuf, BUFFER_SIZE, pFile);

        sRT = strtok (sInputBuf, "\t");     /* Get Route */
        sBR = strtok (NULL, "\t");            /* Get Branch */
        sVR = strtok (NULL, "\t");            /* Get Version */
        sST = strtok (NULL, "\t");            /* Get Stop Number */
        sVI = strtok (NULL, "\t");            /* Get Vehicle */
        sDT = strtok (NULL, "\t");            /* Get Date */
        sTM = strtok (NULL, "\t");            /* Get Time */

        /* ACTUAL INSERT WILL GO HERE */

        n++;
    }
    fclose (pFile);

    printf("Imported %d records in %4.2f seconds\n", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);

    sqlite3_close(db);
    return 0;
}

控件"

按原样运行代码实际上不会执行任何数据库操作，但可以使我们了解原始C文件I/O和字符串处理操作有多快.

The "Control"

Running the code as-is doesn't actually perform any database operations, but it will give us an idea of how fast the raw C file I/O and string processing operations are.

在0.94中导入了864913个记录秒

Imported 864913 records in 0.94 seconds

太好了！只要我们实际上不执行任何插入操作，我们就可以每秒执行920,000次插入操作:-)

Great! We can do 920,000 inserts per second, provided we don't actually do any inserts :-)

我们将使用从文件中读取的值来生成SQL字符串，并使用sqlite3_exec调用该SQL操作:

We're going to generate the SQL string using the values read from the file and invoke that SQL operation using sqlite3_exec:

sprintf(sSQL, "INSERT INTO TTC VALUES (NULL, '%s', '%s', '%s', '%s', '%s', '%s', '%s')", sRT, sBR, sVR, sST, sVI, sDT, sTM);
sqlite3_exec(db, sSQL, NULL, NULL, &sErrMsg);

这将很慢，因为对于每个插入，SQL都将被编译成VDBE代码，并且每个插入将在其自己的事务中发生. 有多慢?

This is going to be slow because the SQL will be compiled into VDBE code for every insert and every insert will happen in its own transaction. How slow?

在9933.61中导入了864913个记录秒

Imported 864913 records in 9933.61 seconds

赞！ 2小时45分钟！每秒仅 85次插入.

Yikes! 2 hours and 45 minutes! That's only 85 inserts per second.

默认情况下，SQLite将评估唯一事务中的每个INSERT/UPDATE语句.如果执行大量插入操作，建议将操作包装在事务中:

By default, SQLite will evaluate every INSERT / UPDATE statement within a unique transaction. If performing a large number of inserts, it's advisable to wrap your operation in a transaction:

sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);

pFile = fopen (INPUTDATA,"r");
while (!feof(pFile)) {

    ...

}
fclose (pFile);

sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);

在38.03中导入了864913条记录秒

Imported 864913 records in 38.03 seconds

那更好.只需将所有插入物包装在一个事务中，就可以将我们的性能提高到每秒23,000个插入物.

That's better. Simply wrapping all of our inserts in a single transaction improved our performance to 23,000 inserts per second.

使用事务是一个巨大的改进，但是如果我们反复使用相同的SQL，则为每个插入重新编译SQL语句是没有意义的.让我们使用sqlite3_prepare_v2一次编译我们的SQL语句，然后使用sqlite3_bind_text将我们的参数绑定到该语句:

Using a transaction was a huge improvement, but recompiling the SQL statement for every insert doesn't make sense if we using the same SQL over-and-over. Let's use sqlite3_prepare_v2 to compile our SQL statement once and then bind our parameters to that statement using sqlite3_bind_text:

/* Open input file and import into the database */
cStartClock = clock();

sprintf(sSQL, "INSERT INTO TTC VALUES (NULL, @RT, @BR, @VR, @ST, @VI, @DT, @TM)");
sqlite3_prepare_v2(db,  sSQL, BUFFER_SIZE, &stmt, &tail);

sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);

pFile = fopen (INPUTDATA,"r");
while (!feof(pFile)) {

    fgets (sInputBuf, BUFFER_SIZE, pFile);

    sRT = strtok (sInputBuf, "\t");   /* Get Route */
    sBR = strtok (NULL, "\t");        /* Get Branch */
    sVR = strtok (NULL, "\t");        /* Get Version */
    sST = strtok (NULL, "\t");        /* Get Stop Number */
    sVI = strtok (NULL, "\t");        /* Get Vehicle */
    sDT = strtok (NULL, "\t");        /* Get Date */
    sTM = strtok (NULL, "\t");        /* Get Time */

    sqlite3_bind_text(stmt, 1, sRT, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 2, sBR, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 3, sVR, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 4, sST, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 5, sVI, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 6, sDT, -1, SQLITE_TRANSIENT);
    sqlite3_bind_text(stmt, 7, sTM, -1, SQLITE_TRANSIENT);

    sqlite3_step(stmt);

    sqlite3_clear_bindings(stmt);
    sqlite3_reset(stmt);

    n++;
}
fclose (pFile);

sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);

printf("Imported %d records in %4.2f seconds\n", n, (clock() - cStartClock) / (double)CLOCKS_PER_SEC);

sqlite3_finalize(stmt);
sqlite3_close(db);

return 0;

在16.27中导入了864913条记录秒

Imported 864913 records in 16.27 seconds

好！还有更多代码(别忘了调用sqlite3_clear_bindings和sqlite3_reset)，但是我们的性能提高了一倍以上，达到每秒 53,000次插入.

Nice! There's a little bit more code (don't forget to call sqlite3_clear_bindings and sqlite3_reset), but we've more than doubled our performance to 53,000 inserts per second.

默认情况下，SQLite将在发出OS级写命令后暂停.这样可以保证将数据写入磁盘.通过设置synchronous = OFF，我们指示SQLite只需将数据移交给OS进行写入，然后继续.如果计算机在将数据写入磁盘之前遭受灾难性崩溃(或电源故障)，则数据库文件可能会损坏:

By default, SQLite will pause after issuing a OS-level write command. This guarantees that the data is written to the disk. By setting synchronous = OFF, we are instructing SQLite to simply hand-off the data to the OS for writing and then continue. There's a chance that the database file may become corrupted if the computer suffers a catastrophic crash (or power failure) before the data is written to the platter:

/* Open the database and create the schema */
sqlite3_open(DATABASE, &db);
sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg);
sqlite3_exec(db, "PRAGMA synchronous = OFF", NULL, NULL, &sErrMsg);

在12.41中导入了864913条记录秒

Imported 864913 records in 12.41 seconds

现在的改进幅度较小，但每秒最多可插入 69,600次.

The improvements are now smaller, but we're up to 69,600 inserts per second.

请考虑通过评估PRAGMA journal_mode = MEMORY将回滚日志存储在内存中.您的事务将更快，但是如果在事务期间断电或程序崩溃，则数据库可能会因部分完成的事务而处于损坏状态:

Consider storing the rollback journal in memory by evaluating PRAGMA journal_mode = MEMORY. Your transaction will be faster, but if you lose power or your program crashes during a transaction you database could be left in a corrupt state with a partially-completed transaction:

/* Open the database and create the schema */
sqlite3_open(DATABASE, &db);
sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg);
sqlite3_exec(db, "PRAGMA journal_mode = MEMORY", NULL, NULL, &sErrMsg);

在13.50中导入864913记录秒

Imported 864913 records in 13.50 seconds

比以前的优化慢一点，每秒 64,000次插入.

A little slower than the previous optimization at 64,000 inserts per second.

让我们结合前面的两个优化.风险更高一些(如果发生崩溃)，但是我们只是在导入数据(不运行银行):

Let's combine the previous two optimizations. It's a little more risky (in case of a crash), but we're just importing data (not running a bank):

/* Open the database and create the schema */
sqlite3_open(DATABASE, &db);
sqlite3_exec(db, TABLE, NULL, NULL, &sErrMsg);
sqlite3_exec(db, "PRAGMA synchronous = OFF", NULL, NULL, &sErrMsg);
sqlite3_exec(db, "PRAGMA journal_mode = MEMORY", NULL, NULL, &sErrMsg);

在12.00中导入了864913条记录秒

Imported 864913 records in 12.00 seconds

太棒了！我们每秒可以执行 72,000次插入操作.

Fantastic! We're able to do 72,000 inserts per second.

只为方便起见，让我们以所有先前的优化为基础，并重新定义数据库文件名，以便我们完全在RAM中工作:

Just for kicks, let's build upon all of the previous optimizations and redefine the database filename so we're working entirely in RAM:

#define DATABASE ":memory:"

在10.94中导入了864913条记录秒

Imported 864913 records in 10.94 seconds

将我们的数据库存储在RAM中并不是很实际，但是令人印象深刻的是，我们每秒可以执行 79,000次插入.

It's not super-practical to store our database in RAM, but it's impressive that we can perform 79,000 inserts per second.

尽管没有特别改进SQLite，但我不喜欢while循环中额外的char*赋值操作.让我们快速重构该代码，将strtok()的输出直接传递到sqlite3_bind_text()，然后让编译器尝试为我们加快速度:

Although not specifically an SQLite improvement, I don't like the extra char* assignment operations in the while loop. Let's quickly refactor that code to pass the output of strtok() directly into sqlite3_bind_text(), and let the compiler try to speed things up for us:

pFile = fopen (INPUTDATA,"r");
while (!feof(pFile)) {

    fgets (sInputBuf, BUFFER_SIZE, pFile);

    sqlite3_bind_text(stmt, 1, strtok (sInputBuf, "\t"), -1, SQLITE_TRANSIENT); /* Get Route */
    sqlite3_bind_text(stmt, 2, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT);    /* Get Branch */
    sqlite3_bind_text(stmt, 3, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT);    /* Get Version */
    sqlite3_bind_text(stmt, 4, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT);    /* Get Stop Number */
    sqlite3_bind_text(stmt, 5, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT);    /* Get Vehicle */
    sqlite3_bind_text(stmt, 6, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT);    /* Get Date */
    sqlite3_bind_text(stmt, 7, strtok (NULL, "\t"), -1, SQLITE_TRANSIENT);    /* Get Time */

    sqlite3_step(stmt);        /* Execute the SQL Statement */
    sqlite3_clear_bindings(stmt);    /* Clear bindings */
    sqlite3_reset(stmt);        /* Reset VDBE */

    n++;
}
fclose (pFile);

注意:我们将回到使用真实的数据库文件.内存数据库速度很快，但不一定实用

在8.94中导入了864913条记录秒

Imported 864913 records in 8.94 seconds

稍稍重构了参数绑定中使用的字符串处理代码，使我们每秒可以执行 96,700次插入..我可以肯定地说，这非常快.当我们开始调整其他变量(即页面大小，索引创建等)时，这将成为我们的基准.

A slight refactoring to the string processing code used in our parameter binding has allowed us to perform 96,700 inserts per second. I think it's safe to say that this is plenty fast. As we start to tweak other variables (i.e. page size, index creation, etc.) this will be our benchmark.

我希望您仍然与我同在！之所以开始，是因为SQLite的批量插入性能变化如此之大，并且并不总是很明显需要对哪些内容进行更改加快我们的运营.使用相同的编译器(和编译器选项)，相同版本的SQLite和相同数据，我们对代码和SQLite的使用进行了优化，以使从最坏的情况下(每秒85次插入)变为超过96,000次插入每秒！

I hope you're still with me! The reason we started down this road is that bulk-insert performance varies so wildly with SQLite, and it's not always obvious what changes need to be made to speed-up our operation. Using the same compiler (and compiler options), the same version of SQLite and the same data we've optimized our code and our usage of SQLite to go from a worst-case scenario of 85 inserts per second to over 96,000 inserts per second!

在开始评估SELECT性能之前，我们知道我们将创建索引.在下面的答案之一中，建议进行批量插入时，插入数据后创建索引的速度更快(与先创建索引然后插入数据相反).让我们尝试一下:

Before we start measuring SELECT performance, we know that we'll be creating indices. It's been suggested in one of the answers below that when doing bulk inserts, it is faster to create the index after the data has been inserted (as opposed to creating the index first then inserting the data). Let's try:

创建索引，然后插入数据

sqlite3_exec(db, "CREATE  INDEX 'TTC_Stop_Index' ON 'TTC' ('Stop')", NULL, NULL, &sErrMsg);
sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);
...

在18.13中导入了864913条记录秒

Imported 864913 records in 18.13 seconds

插入数据，然后创建索引

...
sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);
sqlite3_exec(db, "CREATE  INDEX 'TTC_Stop_Index' ON 'TTC' ('Stop')", NULL, NULL, &sErrMsg);

在13.66中导入了864913条记录秒

Imported 864913 records in 13.66 seconds

正如预期的那样，如果对一列进行索引，则批量插入的速度较慢，但是如果在插入数据后创建索引，则批量插入确实会有所不同.我们的无索引基准是每秒96,000次插入. 先创建索引然后插入数据，每秒可以插入47,700次，而先插入数据然后创建索引，则每秒可以插入63,300次.

As expected, bulk-inserts are slower if one column is indexed, but it does make a difference if the index is created after the data is inserted. Our no-index baseline is 96,000 inserts per second. Creating the index first then inserting data gives us 47,700 inserts per second, whereas inserting the data first then creating the index gives us 63,300 inserts per second.

我很乐意为其他情况提供建议以尝试...并将很快为SELECT查询编译类似的数据.

I'd gladly take suggestions for other scenarios to try... And will be compiling similar data for SELECT queries soon.

提高SQLite每秒INSERT的性能 [英] Improve INSERT-per-second performance of SQLite

问题描述

控件"

The "Control"

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录关闭

提高SQLite每秒INSERT的性能 [英] Improve INSERT-per-second performance of SQLite

问题描述

控件"

The "Control"

推荐答案

相关文章

数据库最新文章

热门教程

热门工具

登录 关闭

登录关闭