SQLite:COUNT在大表上慢 [英] SQLite: COUNT slow on big tables

查看:2365
本文介绍了SQLite:COUNT在大表上慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在SQLite中有一个性能问题,在大型表上有一个SELECT COUNT(*)。



由于我还没有收到一个有用的答案,我做了一些进一步的测试,我编辑我的问题,以纳入我的新发现。



我有两个表:

  CREATE TABLE Table1(
Key INTEGER NOT NULL,
...其他几个字段...,
状态CHAR(1)NOT NULL,
选择VARCHAR NULL,
CONSTRAINT PK_Table1 PRIMARY KEY(Key ASC))

CREATE Table2(
Key INTEGER NOT NULL,
Key2 INTEGER NOT NULL,
...一些其他字段...,
CONSTRAINT PK_Table2 PRIMARY KEY(Key ASC,Key2 ASC))

Table1有大约800万条记录,Table2有大约5100万条记录,数据库文件大小超过5GB。



Table1还有2个索引:

  CREATE INDEX IDX_Table1_Status ON Table1(状态ASC,密钥ASC)
CREATE INDEX IDX_Table1_Selection ON Table1 Key ASC)

状态是必填字段,但只有6个不同的值,选择不需要,并且只有大约150万个值不同于null和只有大约600k不同的值。



我对这两个表做了一些测试,你可以看到下面的时间,我为每个请求(QP)添加了解释查询计划。我把数据库文件放在USB存储器上,所以我可以在每次测试后删除它,并获得可靠的结果,而不受磁盘缓存的干扰。有些请求在USB上更快(我猜想由于缺少查找时间),但有些请求更慢(表扫描)。

  SELECT COUNT(*)FROM Table1 
时间:105秒
QP:SCAN TABLE表1使用覆盖索引IDX_Table1_Selection(〜1000000行)
SELECT COUNT(Key)FROM Table1
时间:153 sec
QP:SCAN TABLE Table1(〜1000000 rows)
SELECT * FROM Table1 WHERE Key = 5123456
时间:5 ms
QP:SEARCH TABLE Table1使用INTEGER PRIMARY KEY =?)(〜1 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key> 5123456 LIMIT 1
时间:16秒
QP:SEARCH TABLE表1使用索引IDX_Table1_Status(Status =?)(〜3 rows)
SELECT * FROM Table1 WHERE Selection ='SomeValue'AND Key> ; 5123456 LIMIT 1
时间:9 ms
QP:SEARCH TABLE表1使用索引IDX_Table1_Selection(选择=?)(〜3行)

正如你可以看到,计数很慢,但正常的选择是快速的(除了第二个,花了16秒)。



表2同样如此:

  SELECT COUNT(*)FROM Table2 
时间:528秒
QP:SCAN TABLE Table2使用覆盖索引sqlite_autoindex_Table2_1(〜1000000行)
SELECT COUNT(Key)FROM Table2
时间:249秒
QP:SCAN TABLE Table2(〜1000000行)
SELECT * FROM Table2 WHERE Key = 5123456 AND Key2 = 0
时间:7 ms
QP:SEARCH TABLE Table2使用索引sqlite_autoindex_Table2_1(Key =?AND Key2 =?) )

为什么SQLite不在Table1的主键上使用自动创建的索引?
为什么当他使用Table2上的自动索引时,它仍然需要很多时间?



我创建了相同的内容相同的表, SQL Server 2008 R2上的索引和计数几乎是即时的。



下面的一个注释建议在数据库上执行ANALYZE。我做了,需要11分钟完成。
之后,我再次运行一些测试:

  SELECT COUNT(*)FROM Table1 
时间:104秒
QP:SCAN TABLE表1使用覆盖索引IDX_Table1_Selection(〜7848023行)
SELECT COUNT(Key)FROM Table1
时间:151秒
QP:SCAN TABLE Table1 (〜7848023 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key> 5123456 LIMIT 1
时间:5 ms
QP:SEARCH TABLE表1使用整数PRIMARY KEY(rowid> ;?)(〜196200行)
SELECT COUNT(*)FROM Table2
时间:529 sec
QP:SCAN TABLE Table2使用覆盖索引sqlite_autoindex_Table2_1(〜51152542行)
SELECT COUNT(Key)FROM Table2
时间:249秒
QP:SCAN TABLE Table2 〜51152542 rows)

正如你所看到的,查询花费了相同的时间现在显示实际行数),只有较慢的选择现在也很快。



接下来,我在Table1的Key字段上创建dan extra索引,到自动索引。我在原始数据库上做了这个,没有ANALYZE数据。创建此索引需要超过23分钟(请记住,这是在USB记忆棒上)。

  CREATE INDEX IDX_Table1_Key ON Table1 (Key ASC)

然后我再次运行测试:

  SELECT COUNT(*)FROM Table1 
时间:4秒
QP:SCAN TABLE表1使用封面索引IDX_Table1_Key(〜1000000行)
SELECT COUNT(Key)FROM Table1
时间:167秒
QP:SCAN TABLE表2(〜1000000行)
SELECT * FROM TABLE1 WHERE Status = 73 AND Key> 5123456 LIMIT 1
时间:17秒
QP:SEARCH TABLE表1使用索引IDX_Table1_Status(状态=?)(约3行)

正如你所看到的,索引帮助计数(*),但不是计数(Key)。



Finaly,我使​​用列约束而不是表约束创建了表:

  CREATE TABLE Table1(
Key INTEGER PRIMARY KEY ASC NOT NULL,
...几个其他字段...,
状态CHAR(1)NOT NULL,
选择VARCHAR NULL)



然后我再次运行测试:

  SELECT COUNT(*)FROM Table1 
时间:6秒
QP:SCAN TABLE表1使用覆盖索引IDX_Table1_Selection(〜1000000行)
SELECT COUNT(Key)FROM Table1
时间:28秒
QP:SCAN TABLE表1(〜1000000行)
SELECT * FROM TABLE1 WHERE Status = 73 AND Key> 5123456限制1
时间:10秒
QP:SEARCH TABLE表1使用索引IDX_Table1_Status(状态=?)(〜3行)

虽然查询计划是相同的,但时间会好很多。为什么是这样?



问题是,ALTER TABLE不允许转换现有的表,我有很多现有的数据库,我不能转换为这种形式。此外,使用列约束代替表约束不会对Table2起作用。



有人知道我在做什么错了,如何解决这个问题? p>

我使用System.Data.SQLite版本1.0.74.0来创建表并运行我使用的SQLiteSpy 1.9.1的测试。



感谢,



Marc

解决方案

没有 DELETE d任何记录,执行:

  _ROWID_)FROMtableLIMIT 1; 

将避免全表扫描。请注意, _ROWID _ 是SQLite标识符


I'm having a performance problem in SQLite with a SELECT COUNT(*) on a large tables.

As I didn't yet receive a usable answer and I did some further testing, I edited my question to incorporate my new findings.

I have 2 tables:

CREATE TABLE Table1 (
Key INTEGER NOT NULL,
... several other fields ...,
Status CHAR(1) NOT NULL,
Selection VARCHAR NULL,
CONSTRAINT PK_Table1 PRIMARY KEY (Key ASC))

CREATE Table2 (
Key INTEGER NOT NULL,
Key2 INTEGER NOT NULL,
... a few other fields ...,
CONSTRAINT PK_Table2 PRIMARY KEY (Key ASC, Key2 ASC))

Table1 has around 8 million records and Table2 has around 51 million records, and the databasefile is over 5GB.

Table1 has 2 more indexes:

CREATE INDEX IDX_Table1_Status ON Table1 (Status ASC, Key ASC)
CREATE INDEX IDX_Table1_Selection ON Table1 (Selection ASC, Key ASC)

"Status" is required field, but has only 6 distinct values, "Selection" is not required and has only around 1.5 million values different from null and only around 600k distinct values.

I did some tests on both tables, you can see the timings below, and I added the "explain query plan" for each request (QP). I placed the database file on an USB-memorystick so i could remove it after each test and get reliable results without interference of the disk cache. Some requests are faster on USB (I suppose due to lack of seektime), but some are slower (table scans).

SELECT COUNT(*) FROM Table1
    Time: 105 sec
    QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Selection(~1000000 rows)
SELECT COUNT(Key) FROM Table1
    Time: 153 sec
    QP: SCAN TABLE Table1 (~1000000 rows)
SELECT * FROM Table1 WHERE Key = 5123456
    Time: 5 ms
    QP: SEARCH TABLE Table1 USING INTEGER PRIMARY KEY (rowid=?) (~1 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1
    Time: 16 sec
    QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Status (Status=?) (~3 rows)
SELECT * FROM Table1 WHERE Selection = 'SomeValue' AND Key > 5123456 LIMIT 1
    Time: 9 ms
    QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Selection (Selection=?) (~3 rows)

As you can see the counts are very slow, but normal selects are fast (except for the 2nd one, which took 16 seconds).

The same goes for Table2:

SELECT COUNT(*) FROM Table2
    Time: 528 sec
    QP: SCAN TABLE Table2 USING COVERING INDEX sqlite_autoindex_Table2_1(~1000000 rows)
SELECT COUNT(Key) FROM Table2
    Time: 249 sec
    QP: SCAN TABLE Table2 (~1000000 rows)
SELECT * FROM Table2 WHERE Key = 5123456 AND Key2 = 0
    Time: 7 ms
    QP: SEARCH TABLE Table2 USING INDEX sqlite_autoindex_Table2_1 (Key=? AND Key2=?) (~1 rows)

Why is SQLite not using the automatically created index on the primary key on Table1 ? And why, when he uses the auto-index on Table2, it still takes a lot of time ?

I created the same tables with the same content and indexes on SQL Server 2008 R2 and there the counts are nearly instantaneous.

One of the comments below suggested executing ANALYZE on the database. I did and it took 11 minutes to complete. After that, I ran some of the tests again:

SELECT COUNT(*) FROM Table1
    Time: 104 sec
    QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Selection(~7848023 rows)
SELECT COUNT(Key) FROM Table1
    Time: 151 sec
    QP: SCAN TABLE Table1 (~7848023 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1
    Time: 5 ms
    QP: SEARCH TABLE Table1 USING INTEGER PRIMARY KEY (rowid>?) (~196200 rows)
SELECT COUNT(*) FROM Table2
    Time: 529 sec
    QP: SCAN TABLE Table2 USING COVERING INDEX sqlite_autoindex_Table2_1(~51152542 rows)
SELECT COUNT(Key) FROM Table2
    Time: 249 sec
    QP: SCAN TABLE Table2 (~51152542 rows)

As you can see, the queries took the same time (except the query plan is now showing the real number of rows), only the slower select is now also fast.

Next, I create dan extra index on the Key field of Table1, which should correspond to the auto-index. I did this on the original database, without the ANALYZE data. It took over 23 minutes to create this index (remember, this is on an USB-stick).

CREATE INDEX IDX_Table1_Key ON Table1 (Key ASC)

Then I ran the tests again:

SELECT COUNT(*) FROM Table1
    Time: 4 sec
    QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Key(~1000000 rows)
SELECT COUNT(Key) FROM Table1
    Time: 167 sec
    QP: SCAN TABLE Table2 (~1000000 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1
    Time: 17 sec
    QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Status (Status=?) (~3 rows)

As you can see, the index helped with the count(*), but not with the count(Key).

Finaly, I created the table using a column constraint instead of a table constraint:

CREATE TABLE Table1 (
Key INTEGER PRIMARY KEY ASC NOT NULL,
... several other fields ...,
Status CHAR(1) NOT NULL,
Selection VARCHAR NULL)

Then I ran the tests again:

SELECT COUNT(*) FROM Table1
    Time: 6 sec
    QP: SCAN TABLE Table1 USING COVERING INDEX IDX_Table1_Selection(~1000000 rows)
SELECT COUNT(Key) FROM Table1
    Time: 28 sec
    QP: SCAN TABLE Table1 (~1000000 rows)
SELECT * FROM Table1 WHERE Status = 73 AND Key > 5123456 LIMIT 1
    Time: 10 sec
    QP: SEARCH TABLE Table1 USING INDEX IDX_Table1_Status (Status=?) (~3 rows)

Although the query plans are the same, the times are a lot better. Why is this ?

The problem is that ALTER TABLE does not permit to convert an existing table and I have a lot of existing databases which i can not convert to this form. Besides, using a column contraint instead of table constraint won't work for Table2.

Has anyone any idea what I am doing wrong and how to solve this problem ?

I used System.Data.SQLite version 1.0.74.0 to create the tables and to run the tests I used SQLiteSpy 1.9.1.

Thanks,

Marc

解决方案

If you haven't DELETEd any records, doing:

SELECT MAX(_ROWID_) FROM "table" LIMIT 1;

Will avoid the full-table scan. Note that _ROWID_ is a SQLite identifier.

这篇关于SQLite:COUNT在大表上慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆