SQLite 可以处理 9000 万条记录吗? [英] Can SQLite handle 90 million records?

查看:20
本文介绍了SQLite 可以处理 9000 万条记录吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

或者我应该使用不同的锤子来解决这个问题.

Or should I use a different hammer to fix this problem.

我有一个非常简单的数据存储用例,实际上是一个稀疏矩阵,我试图将其存储在 SQLite 数据库中.我创建了一个表:

I've got a very simple use-case for storing data, effectively a sparse matrix, which I've attempted to store in a SQLite database. I've created a table:

create TABLE data ( id1 INTEGER KEY, timet INTEGER KEY, value REAL )

我在其中插入了大量数据(每 10 分钟 800 个元素,每天 45 次),一年中的大部分时间.(id1,timet) 的元组将始终是唯一的.

into which I insert a lot of data, (800 elements every 10 minutes, 45 times a day), most days of the year. The tuple of (id1,timet) will always be unique.

timet 值是自纪元以来的秒数,并且会一直增加.出于所有实际目的,id1 是一个随机整数.虽然可能只有 20000 个唯一 ID.

The timet value is seconds since the epoch, and will always be increasing. The id1 is, for all practical purposes, a random integer. There is probably only 20000 unique ids though.

然后我想访问 id1==someid 的所有值或访问 timet==sometime 的所有元素.在我通过 Linux 上的 C 接口使用最新 SQLite 的测试中,查找其中一个(或此查找的任何变体)大约需要 30 秒,这对于我的用例来说不够快.

I'd then like to access all values where id1==someid or access all elements where timet==sometime. On my tests using the latest SQLite via the C interface on Linux, a lookup for one of these (or any variant of this lookup) takes approximately 30 seconds, which is not fast enough for my use case.

我尝试为数据库定义一个索引,但这会使插入速度减慢到完全不可行的速度(虽然我可能做错了...)

I tried defining an index for the database, but this slowed down insertion to completely unworkable speeds (I might have done this incorrectly though...)

上表导致对任何数据的访问都非常缓慢.我的问题是:

The table above leads to very slow access for any data. My question is:

  • SQLite 是完全错误的工具吗?
  • 我可以定义索引以显着加快速度吗?
  • 为此我应该使用 HDF5 之类的东西而不是 SQL 吗?

请原谅我对 SQL 的基本了解!

Please excuse my very basic understanding of SQL!

谢谢

我包含了一个代码示例,展示了在使用索引时插入速度如何减慢到爬行.使用创建索引"语句,代码需要 19 分钟才能完成.没有它,它会在 18 秒内运行.

I include a code sample that shows how the insertion speed slows to a crawl when using indices. With the 'create index' statements in place, the code takes 19 minutes to complete. Without that, it runs in 18 seconds.

#include <iostream>
#include <sqlite3.h>

void checkdbres( int res, int expected, const std::string msg ) 
{
  if (res != expected) { std::cerr << msg << std::endl; exit(1); } 
}

int main(int argc, char **argv)
{
  const size_t nRecords = 800*45*30;

  sqlite3      *dbhandle = NULL;
  sqlite3_stmt *pStmt = NULL;
  char statement[512];

  checkdbres( sqlite3_open("/tmp/junk.db", &dbhandle ), SQLITE_OK, "Failed to open db");

  checkdbres( sqlite3_prepare_v2( dbhandle, "create table if not exists data ( issueid INTEGER KEY, time INTEGER KEY, value REAL);", -1, & pStmt, NULL ), SQLITE_OK, "Failed to build create statement");
  checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );
  checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");
  checkdbres( sqlite3_prepare_v2( dbhandle, "create index issueidindex on data (issueid );", -1, & pStmt, NULL ), SQLITE_OK, "Failed to build create statement");
  checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );
  checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");
  checkdbres( sqlite3_prepare_v2( dbhandle, "create index timeindex on data (time);", -1, & pStmt, NULL ), SQLITE_OK, "Failed to build create statement");
  checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );
  checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");

  for ( size_t idx=0; idx < nRecords; ++idx)
  {
    if (idx%800==0)
    {
      checkdbres( sqlite3_prepare_v2( dbhandle, "BEGIN TRANSACTION", -1, & pStmt, NULL ), SQLITE_OK, "Failed to begin transaction");
      checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute begin transaction" );
      checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize begin transaction");
      std::cout << "idx " << idx << " of " << nRecords << std::endl;
    }

    const size_t time = idx/800;
    const size_t issueid = idx % 800;
    const float value = static_cast<float>(rand()) / RAND_MAX;
    sprintf( statement, "insert into data values (%d,%d,%f);", issueid, (int)time, value );
    checkdbres( sqlite3_prepare_v2( dbhandle, statement, -1, &pStmt, NULL ), SQLITE_OK, "Failed to build statement");
    checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute insert statement" );
    checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize insert");

    if (idx%800==799)
    {
      checkdbres( sqlite3_prepare_v2( dbhandle, "END TRANSACTION", -1, & pStmt, NULL ), SQLITE_OK, "Failed to end transaction");
      checkdbres( sqlite3_step( pStmt ), SQLITE_DONE, "Failed to execute end transaction" );
      checkdbres( sqlite3_finalize( pStmt ), SQLITE_OK, "Failed to finalize end transaction");
    }
  }

  checkdbres( sqlite3_close( dbhandle ), SQLITE_OK, "Failed to close db" ); 
}

<小时>

推荐答案

您是否一次插入所有 800 个元素?如果是这样,在事务中进行插入会显着加快进程.

Are you inserting all of the 800 elements at once? If you are, doing the inserts within a transaction will speed up the process dramatically.

参见 http://www.sqlite.org/faq.html#q19

SQLite 可以处理非常大的数据库.请参阅 http://www.sqlite.org/limits.html

SQLite can handle very large databases. See http://www.sqlite.org/limits.html

这篇关于SQLite 可以处理 9000 万条记录吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆