按rowid高效检索SQLite行 [英] retrieve SQLite rows by rowid efficiently

查看:163
本文介绍了按rowid高效检索SQLite行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用C接口SQLite并有一些关于rowid字段的基本问题,以及如何有效地从已知rowid的任意行集合检索数据。我实际上有几个相关的问题,所以我会打电话给他们出去,我去。但我的主要问题是结束。



我有一个表:

 code> sqlite3_exec(db,create table mytable(value BLOB,value2 TEXT)),NULL,NULL,NULL); 

,我用230万行填充它。我还在表上创建了两个索引:

  sqlite3_exec(db,CREATE INDEX r_index ON mytable(rowid) NULL,& errorMessage); 

sqlite3_exec(db,CREATE INDEX v_index ON mytable(value),NULL,NULL,& errorMessage);

我知道rowid索引是不必要的。我看到SQLite需要0秒创建rowid索引,我相信这是因为rowid总是一个隐含的现有的索引表,因为表是(通常?)以rowid顺序存储。



在任何情况下,我想要做的是通过rowid从这个表中快速检索任意行集。我做的是创建一个内存记录列表:

  class MyInMemoryIndexElement 
{
public:
sqlite3_int64 _rowId;
MyKeyType _key;
}

vector< ObjectsInMemoryIndexElement> inMemoryIndex;

rc = sqlite3_prepare_v2(db,select rowid,value from mytable),-1,& stmt,NULL);

for(; sqlite3_step(stmt)== SQLITE_ROW;)
{
MyInMemoryIndexElement e;
e._rowId = sqlite3_column_int64(stmt,0);
e._key = GetMyKeyFromValueBlob(sqlite3_column_blob(stmt,1));
inMemoryIndex.push_back(e);
}

上面的循环遍历所有230万条记录并创建这个内存记录的向量,只需要1.5秒(并且可能通过预先分配向量的空间更快)。 (实际上,当我关闭实际添加记录到向量的部分时,查询的时间只有0.95秒。更令人惊讶的是,当我使用sqlite3_exec()与回调函数,而不是语句/ step方法,我可以在0.55秒读取数据库中的所有值blob)我发现,如果我没有索引在表上的值字段,这些选择语句大约需要5秒更长。 (不是我的主要问题,但我已经不明白为什么通过值列对建立索引可以更快地查询表格中的所有行从每一行获取值,但也许搜索引擎实际上可以使用存储在索引中的值,而不必从表中读取值本身?)



另一个重要的注释是,当我在调试器中遍历该循环时,我看到行以意外的顺序处理。我想我会先得到rowid 1,然后rowid 2,等等,因为我没有指定任何关于排序,我只是要求它给我所有的行一次一个。然而,我发现,我得到的第一个rowid是在60万的地方,然后rowids跳从那里。 所以也许这是因为它按照值索引的顺序返回行,这是与物理记录/ rowid顺序无关的一些b-tree顺序?



无论如何,现在我在内存中有这个索引,并且在程序中的不同时间我想通过该表,并检查每个条目的_key,如果该_key有一定属性,我想得到价值的那家伙。所以我有一个循环:

  sqlite3_stmt * stmt; 
rc = sqlite3_prepare_v2(db,select value from mytable where rowid =?).c_str(),-1,& stmt,NULL);

for(int i = 0; i {
if(MySpecialFunction(inMemoryIndex [i] ._ key))
{
sqlite3_reset(stmt);
sqlite3_clear_bindings(stmt);
sqlite3_bind_int64(stmt,1,inMemoryIndex [i] ._ rowId);

if(sqlite3_step(stmt)== SQLITE_ROW)
{
const void * v = sqlite3_column_blob(stmt,0);
DoWhatIWantWithV(v);
}
}
}

不幸的是到我的主要问题),该循环需要大约1.6秒运行在230万条记录中的大约14,000通过MySpecialFunction()测试的情况下。也就是说,读取14,000条记录需要1.6秒,而读取所有230万条记录只需要0.55秒。



由于上述奇怪的rowid顺序,我没有尝试在rowid排序inMemoryIndex。这使它运行在约1.3秒而不是1.6。



所以我的主要问题是:



能够使用语句/步骤在0.95秒的230万行数据库中选择每个值blob(事实上,如果我使用sqlite3_exec()方法与回调我可以做到在0.55秒)。



我遇到了创建我的inMemoryIndex向量的麻烦,因为在大多数情况下,在任何给定时间,我只想记录230万行的一小部分,例如14000行。所以我想如果我知道这14000 rowid我可以只是读那些行。但是当我这样做与

 从mytable选择值,其中rowid =? 

语句迭代地绑定到每个已知的rowid,它需要1.6秒,显着长于读取

因此:



我可以对这种方法(例如,一些其他索引,操作顺序等)进行微小的改变,以加快速度。



strong>(2)这种做事方式有没有根本的缺陷?



*(我应该评论,内存索引这样是违反了我应该离开查询规划到SQL引擎本身的想法,我这样做这是因为一般我的逻辑决定在给定时间我感兴趣的记录 - 在上面的代码中由MySpecialFunction()表示的 - 比我认为我可以在SQL逻辑中更复杂我对我的想法是开放的,我需要重新考虑。但现在我的问题是关于事实上,看起来令人惊讶的是,从已知的rowid读取14k条记录花费的时间比读取所有230万条记录需要的时间长得多。






更新/解决方案



这是我添加的由pm100建议的代码,读这14000行到约0.19秒。它仍然是阅读完整的230万条记录所花费的时间的1/3以上,但我会把它拿出来。



请注意,inMemoryIndex已经按_rowId排序。

  sqlite3_intarray * intArrayPointer1; 

sqlite3_intarray_create(db,int_array_1,& intArrayPointer1);

矢量< sqlite3_int64> v;
for(int i = 0; i {
if(MySpecialFunction(inMemoryIndex [i] ._ key))
{
v.push_back(inMemoryIndex [i] ._ rowId);
}
}

sqlite3_intarray_bind(intArrayPointer1,v.size(),& v [0],NULL);

sqlite3_stmt * stmt;
sqlite3_prepare_v2(db,select value from mytable where rowid in int_array_1,-1,& stmt,NULL);

for(; sqlite3_step(stmt)== SQLITE_ROW;)
{
const void * blob = sqlite3_column_blob(stmt,0);
// ...根据需要使用valueblob
}


有一个使用虚拟表格的代码插件。 ://www.sqlite.org/src/artifact/9dc57417fb65bc78rel =nofollow> https://www.sqlite.org/src/artifact/9dc57417fb65bc78
https://www.sqlite.org/src/artifact/870124b95ec4c645


I am using the C interface to SQLite and have some basic questions about the rowid field and how to efficiently retrieve data from an arbitrary set of rows with known rowids. I actually have several related questions, so I will call them out in bold as I go. But the main questions I have are at the end.

I have a table:

sqlite3_exec( db, "create table mytable ( value BLOB, value2 TEXT ) )", NULL, NULL, NULL );

and I fill it with 2.3 million rows. I also create two indexes on the table:

sqlite3_exec( db, "CREATE INDEX r_index ON mytable (rowid)", NULL, NULL, &errorMessage );

sqlite3_exec( db, "CREATE INDEX v_index ON mytable (value)", NULL, NULL, &errorMessage );

I am aware that the rowid index is unnecessary. I see that SQLite takes 0 sec to "create" the rowid index, and I believe this is because rowid is always an implicit existing "index" on a table, since the table is (usually?) stored in rowid order.

In any case, what I want to be able to do is retrieve arbitrary sets of rows quickly from this table, by rowid. What I do is create an in-memory list of records:

class MyInMemoryIndexElement
{
public:
    sqlite3_int64 _rowId;
    MyKeyType _key;
}

vector<ObjectsInMemoryIndexElement> inMemoryIndex;

rc = sqlite3_prepare_v2( db, "select rowid, value from mytable" ), -1, &stmt, NULL );

for ( ; sqlite3_step( stmt ) == SQLITE_ROW ; )
{
    MyInMemoryIndexElement e;
    e._rowId = sqlite3_column_int64( stmt, 0 );
    e._key = GetMyKeyFromValueBlob( sqlite3_column_blob( stmt, 1 ) );
    inMemoryIndex.push_back( e );
}

The loop above, reading through all 2.3 million records and creating this in-memory vector of records, takes only 1.5 seconds (and could probably be made faster by preallocating space for the vector). (In fact, when I turn off the part about actually adding the record to the vector, the time for the query alone is only 0.95 sec. And even more amazing, when I use a sqlite3_exec() with a callback function, instead of the statement/step method, I can read all of the "value" blobs in the database in 0.55 sec.) I found that if I do not have an index on the table by the "value" field, these select statements take about 5 seconds longer. (Not my main question, but I already don't understand why indexing by the "value" column would make it faster to query the table for all rows to get the "value" from each row, but maybe the search engine can actually use the values stored in the index instead of having to read the values from the table itself?)

Another important comment is that when I step through that loop in the debugger, I see that the rows are processed in an unexpected order. I was thinking that I would get rowid 1 first, then rowid 2, and so on, since I'm not specifying anything about sorting, and I'm just asking it to give me all the rows one at a time. However, what I find is that the first rowid I get is somewhere in the 600,000's, and then the rowids jump around from there. So maybe that's because it's returning the rows in the order of the "value" index, which is some b-tree order that has nothing to do with the physical record / rowid order?

Anyway, so now I have this index in memory, and at various times in the program I want to walk through that table, and check the _key of each entry, and if that _key has certain properties, I want to get the "value" for that guy. So I have a loop:

sqlite3_stmt *stmt;
rc = sqlite3_prepare_v2( db, "select value from mytable where rowid = ?" ).c_str(), -1, &stmt, NULL );

for ( int i = 0 ; i < inMemoryIndex.size() ; i++ )
{
    if ( MySpecialFunction( inMemoryIndex[ i ]._key ) )
    {
        sqlite3_reset( stmt );
        sqlite3_clear_bindings( stmt );
        sqlite3_bind_int64( stmt, 1, inMemoryIndex[ i ]._rowId );

        if ( sqlite3_step( stmt ) == SQLITE_ROW )
        {
            const void *v = sqlite3_column_blob( stmt, 0 );
            DoWhatIWantWithV( v );
        }
    }
}

Unfortunately (and here we get to my main question), that loop takes about 1.6 seconds to run in the case that about 14,000 out of the 2.3 million records pass the MySpecialFunction() test. That is, it takes about 1.6 seconds to read 14,000 records, whereas it took only 0.55 seconds to read all 2.3 million records.

Because of the strange rowid ordering mentioned above, I did try sorting inMemoryIndex by rowid. This made it run in about 1.3 seconds instead of 1.6.

So my main question is:

I am able to use the statement/step to select every "value" blob in the 2.3 million row database in 0.95 sec (and in fact if I use the sqlite3_exec() method with a callback I can do it in 0.55 sec).

I went to the trouble of creating my inMemoryIndex vector because in most cases at any given time I only want records for a small subset of the 2.3 million rows, for instance 14,000 of them. So I thought if I knew these 14,000 rowid's I could "just read those rows". But when I do that with the

"select value from mytable where rowid = ?"

statement iteratively binding to each of the known rowid's, it takes 1.6 seconds, significantly longer than reading every row in the database.

So:

(1) Is there a small change I could make to this approach (e.g., some other index, order of operations, etc.) that could speed it up?

(2) Is there something fundamentally flawed about this way of doing things?

*(I should comment that do realize that creating my own in-memory index like this is going against the idea that I should leave query planning up to the SQL engine itself. I'm doing it this way because in general my logic for deciding which records I'm interested in at a given time -- as expressed by MySpecialFunction() in the code above -- is more complex than I think I can do in SQL logic. I'm open to the idea that I need to reconsider that. But for now my question is just about the fact that it seems surprising that it takes so much longer to read 14k records from known rowid's than it takes to read all 2.3 million records.)


Update/Solution

Here is the code I added as suggested by pm100, which brought the time to read those 14,000 rows down to about 0.19 sec. It's still more than 1/3 the time it took to read the full 2.3 million records, but I'll take it.

Note that inMemoryIndex has been sorted by _rowId.

sqlite3_intarray *intArrayPointer1;

sqlite3_intarray_create( db, "int_array_1", &intArrayPointer1 );

vector<sqlite3_int64> v;
for ( int i = 0 ; i < inMemoryIndex.size() ; i++ )
{
    if ( MySpecialFunction( inMemoryIndex[ i ]._key ) )
    {
        v.push_back( inMemoryIndex[ i ]._rowId );
    }
}

sqlite3_intarray_bind( intArrayPointer1, v.size(), &v[ 0 ], NULL );

sqlite3_stmt *stmt;
sqlite3_prepare_v2( db, "select value from mytable where rowid in int_array_1", -1, &stmt, NULL );

for ( ; sqlite3_step( stmt ) == SQLITE_ROW ; )
{
    const void *blob = sqlite3_column_blob( stmt, 0 );
    // ... work with "value" blob as you wish
}

解决方案

there is a code addon that uses a virtual table to do exactly what you want.

https://www.sqlite.org/src/artifact/9dc57417fb65bc78 https://www.sqlite.org/src/artifact/870124b95ec4c645

这篇关于按rowid高效检索SQLite行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆