H2数据库:尽管使用索引,但查询缓慢 [英] H2 database: slow query although index is used

查看:330
本文介绍了H2数据库:尽管使用索引,但查询缓慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用H2 1.3.176。

Using the H2 1.3.176.

1)表定义:

CREATE TABLE TEST(ID BIGINT PRIMARY KEY, ACCOUNT BIGINT, TXID BIGINT); 

2)将值插入表中:

INSERT INTO TEST SELECT X, RAND()*100, X FROM SYSTEM_RANGE(1, 1000000)

3)创建用于我的查询的索引:

3) creating an index to use for my query:

CREATE Unique INDEX IDX_TEST_ACCOUNT_TXID ON `test` (account, txId DESC);

4)执行以下查询:

explain analyze
select txid from test where account=22 AND txid<9999999 order by txid desc limit 25

我得到以下执行计划:

SELECT
    TXID
FROM PUBLIC.TEST
    /* PUBLIC.IDX_TEST_ACCOUNT_TXID: ACCOUNT = 22
        AND TXID < 9999999
     */
    /* scanCount: 9867 */
WHERE (ACCOUNT = 22)
    AND (TXID < 9999999)
ORDER BY 1 DESC
LIMIT 25
/*
TEST.IDX_TEST_ACCOUNT_TXID read: 103
*/

问题:为什么H2需要扫描整个索引?我原本希望扫描计数为25,因为索引中的txid应该已经按降序排列,所以一旦H2在索引的account = 22分支中,它就应该能够读取接下来的25个条目。如果表中有数百万个条目,这将导致查询速度变慢。即使H2必须在索引中搜索第一个匹配条目,我也希望这是O(log(N))算法而不是扫描。
如果在没有列帐户的情况下执行相同的操作(意味着表仅包含id和txid),则txid的降序索引确实会导致扫描计数为25(使用查询从测试中选择txid,其中txid< 9999999按txid desc排序)。
为什么附加的列会破坏执行计划?
也许我不了解索引的工作原理。有没有更好的方法来为查询定义索引?

Question: why does H2 need to scan through the entire index? I was expecting the scan count to be 25 since the txid in the index should be in descending order already so once H2 is in the account=22 branch of the index it should be able to just read the next 25 entries. This will lead to slow queries if there are millions of entries in the table. Even if H2 has to search for the first matching entry within the index I would expect this to be an O(log(N)) algorithm and not a scan. If I do the same thing without the column account (means that the table just contains id and txid) then a descending index on txid will indeed result in a scan count of 25 (using the query "select txid from test where txid<9999999 order by txid desc"). Why is the additional column ruining the execution plan? Maybe I don't understand how the index works. Is there a better way to define an index for my query?

推荐答案

我逐步浏览了h2源代码,发现了什么是错误:

I stepped through the h2 source code and found out what is going wrong:

在准备执行查询期间,h2尝试确定是否可以使用索引对结果集进行排序和限制。由于第一个索引列(帐户)不在order by子句中,因此h2认为它无法使用该索引。这导致h2扫描整个索引以获取所有行,然后对结果集进行排序和限制。这是令人惊讶的,因为考虑的条件是相等条件,因此h2应该意识到它确实可以使用索引来排序和限制结果集。
解决方案是在order by子句中提供account列。因此查询应该是:

During preparing the execution of the query, h2 tries to determine whether it can use the index for sorting and limiting the result set. Since the first index column (account) is not in the order by clause, h2 thinks it cannot use the index. This results in h2 scanning through the entire index fetching all rows and then sorting and limiting the result set. This is surprising since the condition on account is an "equal" condition so h2 should realize that it can indeed use the index for sorting and limiting the result set. The solution is to provide the account column in the order by clause. Thus the query should be:

select txid from test where account=22 AND txid<9999999 order by account, txid desc limit 25

我得到了预期的执行计划

and i get the expected execution plan

SELECT
    TXID
FROM PUBLIC.TEST
    /* PUBLIC.IDX_TEST_ACCOUNT_TXID: ACCOUNT = 22
        AND TXID < 9999999
     */
    /* scanCount: 25 */
WHERE (ACCOUNT = 22)
    AND (TXID < 9999999)
ORDER BY =ACCOUNT, 1 DESC
LIMIT 25
/* index sorted */

具有扫描计数只有25个:)

which has a scan count of only 25 :)

这篇关于H2数据库:尽管使用索引,但查询缓慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆