从 SQLite 查询中删除临时 B 树排序 [英] Removing a Temporary B Tree Sort from a SQLite Query

查看:59
本文介绍了从 SQLite 查询中删除临时 B 树排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个非常基本的图片上传服务实现,您可以在其中上传图片并标记它们.这是我的架构:

I have a pretty basic implementation of an Image uploading service, where you can upload images and tag them. This is my schema:

CREATE TABLE Tag(
    orm_id INTEGER PRIMARY KEY AUTOINCREMENT, 
    pid_high UNSIGNED BIG INT NOT NULL, 
    pid_low UNSIGNED BIG INT NOT NULL, 
    name STRING NOT NULL, 
    CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);

CREATE TABLE TagBridge(
    orm_id INTEGER PRIMARY KEY AUTOINCREMENT, 
    pid_high UNSIGNED BIG INT NOT NULL, 
    pid_low UNSIGNED BIG INT NOT NULL, 
    image_id_high UNSIGNED BIG INT NOT NULL, 
    image_id_low UNSIGNED BIG INT NOT NULL, 
    tag_id_high UNSIGNED BIG INT NOT NULL, 
    tag_id_low UNSIGNED BIG INT NOT NULL, 
    CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);

CREATE TABLE Image(
    orm_id INTEGER PRIMARY KEY AUTOINCREMENT, 
    pid_high UNSIGNED BIG INT NOT NULL,
    pid_low UNSIGNED BIG INT NOT NULL, 
    filehash STRING NOT NULL, 
    mime STRING NOT NULL, 
    uploadedDate INTEGER NOT NULL, 
    ratingsAverage REAL, 
    CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);

和指数

CREATE INDEX ImageTest on Image(pid_high, pid_low, uploadedDate DESC);
CREATE INDEX ImagefilehashIndex ON Image (filehash);
CREATE INDEX ImageuploadedDateIndex ON Image (uploadedDate);
CREATE INDEX TagnameIndex ON Tag (name);

使用 pid_high/pid_low 字段而不是标准主键的原因是该服务使用客户端权威的 128 位 GUID,但这不会显着影响查询速度.

The reason that there are pid_high/pid_low fields instead of your standard primary key is because this service uses client-authoritative 128-bit GUIDs, but this does not impact the query speed significantly.

由于这是互联网,因此该服务上的绝大多数图像都是猫,并带有猫"标签.事实上,50,000 张图像中约有 47,000 张带有cat"标签.获取所有标记为cat"的图像的查询是

Since this is the internet, the vast majority of the images on this service are cats, and are tagged with 'cat'. In fact, about 47,000 out of 50,000 images are tagged with 'cat'. The query to get all images that is tagged with 'cat' is

select i.* from Tag t, TagBridge b, Image i 
where 
    b.tag_id_high = t.pid_high AND b.tag_id_low = t.pid_low 
AND b.image_id_high = i.pid_high and b.image_id_low = i.pid_low 
AND t.name ='cat' 
order by uploadedDate DESC LIMIT 20;

这个查询计划是

sele  order          from  deta
----  -------------  ----  ----
0     0              0     SEARCH TABLE Tag AS t USING INDEX TagnameIndex (name=?) (~1 rows)
0     1              1     SCAN TABLE TagBridge AS b (~472 rows)
0     2              2     SEARCH TABLE Image AS i USING INDEX ImageTest (pid_high=? AND pid_low=?) (~1 rows)
0     0              0     USE TEMP B-TREE FOR ORDER BY

这里的主要问题是最后一行,USE TEMP B-TREE FOR ORDER BY.这会显着减慢查询速度.如果没有 'order by' 子句,整个查询大约需要 0.001 秒才能运行.使用 order by 子句,查询需要 0.483 秒,这是 400 倍的性能损失.

The main problem here is the last row, USE TEMP B-TREE FOR ORDER BY. This slows down the query significantly. Without the 'order by' clause, the entire query takes about 0.001s to run. With the order by clause, the query takes 0.483s, which is a 400x performance penalty.

我想在 0.1 秒内获得此查询,但我不确定如何.我尝试了许多其他查询,以及添加和删除索引,但这是我能够运行的最快的查询.

I would like to get this query under 0.1 seconds, but I am not sure how. I have tried many other queries, and adding and removing indices, but this is the fastest that I have been able to run.

推荐答案

这是一个在过滤和排序索引之间进行选择的普遍问题:

This is a general problem of choosing between filtering and ordering index:

您应该保留一个流行标签列表(排序索引对它们更有利),如果标签很流行,则以某种方式禁止过滤索引,例如:

You should keep a list of popular tags (for which the ordering index is more beneficial) and somehow forbid the filtering index if the tag is popular, say, like this:

SELECT  i.*
FROM    Tag t, TagBridge b, Image i 
WHERE   b.tag_id_high = t.pid_high AND b.tag_id_low = t.pid_low 
        AND b.image_id_high = i.pid_high AND b.image_id_low = i.pid_low 
        AND t.name || '' = 'cat' 
ORDER BY
        i.uploadedDate DESC
LIMIT 20

或者,您可以对架构进行非规范化并将 uploadedDate 添加到 TagBridge,并使用触发器或其他内容填充它.然后在TagBridge (pid_high, pid_low, uploadDate, image_id_high, image_id_low)上创建一个复合索引,稍微改写一下查询:

Alternatively, you could denormalize your schema and add uploadedDate to TagBridge, filling it with a trigger or whatever. Then create a composite index on TagBridge (pid_high, pid_low, uploadedDate, image_id_high, image_id_low) and rewrite the query a little:

SELECT  i.*
FROM    TagBridge b, Image i
WHERE   b.tag_id_high =
        (
        SELECT  t.pid_high
        FROM    Tag t
        WHERE   t.name = 'cat'
        )
        AND b.tag_id_low =
        (
        SELECT  t.pid_low
        FROM    Tag t
        WHERE   t.name = 'cat'
        )
        AND i.pid_high = b.image_id_high
        AND i.pid_low = b.image_id_low
ORDER BY
        b.uploadedDate DESC
LIMIT 20;

双子查询是因为SQLite不理解元组语法.

The double subquery is because SQLite does not understand tuple syntax.

这篇关于从 SQLite 查询中删除临时 B 树排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆