从 SQLite 查询中删除临时 B 树排序 [英] Removing a Temporary B Tree Sort from a SQLite Query
问题描述
我有一个非常基本的图片上传服务实现,您可以在其中上传图片并标记它们.这是我的架构:
I have a pretty basic implementation of an Image uploading service, where you can upload images and tag them. This is my schema:
CREATE TABLE Tag(
orm_id INTEGER PRIMARY KEY AUTOINCREMENT,
pid_high UNSIGNED BIG INT NOT NULL,
pid_low UNSIGNED BIG INT NOT NULL,
name STRING NOT NULL,
CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);
CREATE TABLE TagBridge(
orm_id INTEGER PRIMARY KEY AUTOINCREMENT,
pid_high UNSIGNED BIG INT NOT NULL,
pid_low UNSIGNED BIG INT NOT NULL,
image_id_high UNSIGNED BIG INT NOT NULL,
image_id_low UNSIGNED BIG INT NOT NULL,
tag_id_high UNSIGNED BIG INT NOT NULL,
tag_id_low UNSIGNED BIG INT NOT NULL,
CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);
CREATE TABLE Image(
orm_id INTEGER PRIMARY KEY AUTOINCREMENT,
pid_high UNSIGNED BIG INT NOT NULL,
pid_low UNSIGNED BIG INT NOT NULL,
filehash STRING NOT NULL,
mime STRING NOT NULL,
uploadedDate INTEGER NOT NULL,
ratingsAverage REAL,
CONSTRAINT KeyConstraint UNIQUE (pid_high, pid_low) ON CONFLICT FAIL);
和指数
CREATE INDEX ImageTest on Image(pid_high, pid_low, uploadedDate DESC);
CREATE INDEX ImagefilehashIndex ON Image (filehash);
CREATE INDEX ImageuploadedDateIndex ON Image (uploadedDate);
CREATE INDEX TagnameIndex ON Tag (name);
使用 pid_high/pid_low 字段而不是标准主键的原因是该服务使用客户端权威的 128 位 GUID,但这不会显着影响查询速度.
The reason that there are pid_high/pid_low fields instead of your standard primary key is because this service uses client-authoritative 128-bit GUIDs, but this does not impact the query speed significantly.
由于这是互联网,因此该服务上的绝大多数图像都是猫,并带有猫"标签.事实上,50,000 张图像中约有 47,000 张带有cat"标签.获取所有标记为cat"的图像的查询是
Since this is the internet, the vast majority of the images on this service are cats, and are tagged with 'cat'. In fact, about 47,000 out of 50,000 images are tagged with 'cat'. The query to get all images that is tagged with 'cat' is
select i.* from Tag t, TagBridge b, Image i
where
b.tag_id_high = t.pid_high AND b.tag_id_low = t.pid_low
AND b.image_id_high = i.pid_high and b.image_id_low = i.pid_low
AND t.name ='cat'
order by uploadedDate DESC LIMIT 20;
这个查询计划是
sele order from deta
---- ------------- ---- ----
0 0 0 SEARCH TABLE Tag AS t USING INDEX TagnameIndex (name=?) (~1 rows)
0 1 1 SCAN TABLE TagBridge AS b (~472 rows)
0 2 2 SEARCH TABLE Image AS i USING INDEX ImageTest (pid_high=? AND pid_low=?) (~1 rows)
0 0 0 USE TEMP B-TREE FOR ORDER BY
这里的主要问题是最后一行,USE TEMP B-TREE FOR ORDER BY.这会显着减慢查询速度.如果没有 'order by' 子句,整个查询大约需要 0.001 秒才能运行.使用 order by 子句,查询需要 0.483 秒,这是 400 倍的性能损失.
The main problem here is the last row, USE TEMP B-TREE FOR ORDER BY. This slows down the query significantly. Without the 'order by' clause, the entire query takes about 0.001s to run. With the order by clause, the query takes 0.483s, which is a 400x performance penalty.
我想在 0.1 秒内获得此查询,但我不确定如何.我尝试了许多其他查询,以及添加和删除索引,但这是我能够运行的最快的查询.
I would like to get this query under 0.1 seconds, but I am not sure how. I have tried many other queries, and adding and removing indices, but this is the fastest that I have been able to run.
推荐答案
这是一个在过滤和排序索引之间进行选择的普遍问题:
This is a general problem of choosing between filtering and ordering index:
您应该保留一个流行标签列表(排序索引对它们更有利),如果标签很流行,则以某种方式禁止过滤索引,例如:
You should keep a list of popular tags (for which the ordering index is more beneficial) and somehow forbid the filtering index if the tag is popular, say, like this:
SELECT i.*
FROM Tag t, TagBridge b, Image i
WHERE b.tag_id_high = t.pid_high AND b.tag_id_low = t.pid_low
AND b.image_id_high = i.pid_high AND b.image_id_low = i.pid_low
AND t.name || '' = 'cat'
ORDER BY
i.uploadedDate DESC
LIMIT 20
或者,您可以对架构进行非规范化并将 uploadedDate
添加到 TagBridge
,并使用触发器或其他内容填充它.然后在TagBridge (pid_high, pid_low, uploadDate, image_id_high, image_id_low)
上创建一个复合索引,稍微改写一下查询:
Alternatively, you could denormalize your schema and add uploadedDate
to TagBridge
, filling it with a trigger or whatever. Then create a composite index on TagBridge (pid_high, pid_low, uploadedDate, image_id_high, image_id_low)
and rewrite the query a little:
SELECT i.*
FROM TagBridge b, Image i
WHERE b.tag_id_high =
(
SELECT t.pid_high
FROM Tag t
WHERE t.name = 'cat'
)
AND b.tag_id_low =
(
SELECT t.pid_low
FROM Tag t
WHERE t.name = 'cat'
)
AND i.pid_high = b.image_id_high
AND i.pid_low = b.image_id_low
ORDER BY
b.uploadedDate DESC
LIMIT 20;
双子查询是因为SQLite
不理解元组语法.
The double subquery is because SQLite
does not understand tuple syntax.
这篇关于从 SQLite 查询中删除临时 B 树排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!