在大表上使用 OFFSET 优化查询 [英] Optimize query with OFFSET on large table

查看:41
本文介绍了在大表上使用 OFFSET 优化查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有桌子

create table big_table (
id serial primary key,
-- other columns here
vote int
); 

这个表很大,大概有7000万行,我需要查询:

This table is very big, approximately 70 million rows, I need to query:

SELECT * FROM big_table
ORDER BY vote [ASC|DESC], id [ASC|DESC]
OFFSET x LIMIT n  -- I need this for pagination

您可能知道,当 x 是一个大数时,这样的查询非常慢.

As you may know, when x is a large number, queries like this are very slow.

为了性能优化,我添加了索引:

For performance optimization I added indexes:

create index vote_order_asc on big_table (vote asc, id asc);

create index vote_order_desc on big_table (vote desc, id desc);

EXPLAIN 表明上面的 SELECT 查询使用了这些索引,但是无论如何它都非常慢,并且偏移量很大.

EXPLAIN shows that the above SELECT query uses these indexes, but it's very slow anyway with a large offset.

如何使用 OFFSET 在大表中优化查询?也许 PostgreSQL 9.5 甚至更新的版本有一些特性?我已经搜索过但没有找到任何东西.

What can I do to optimize queries with OFFSET in big tables? Maybe PostgreSQL 9.5 or even newer versions have some features? I've searched but didn't find anything.

推荐答案

较大的 OFFSET 总是很慢.Postgres 必须对所有行进行排序并计算可见 行直到您的偏移量.要直接跳过所有前面的行,您可以向表中添加一个索引row_number(或创建一个MATERIALIZED VIEW 包括上述 row_number) 并使用 WHERE row_number >x 而不是 OFFSET x.

A large OFFSET is always going to be slow. Postgres has to order all rows and count the visible ones up to your offset. To skip all previous rows directly you could add an indexed row_number to the table (or create a MATERIALIZED VIEW including said row_number) and work with WHERE row_number > x instead of OFFSET x.

然而,这种方法仅适用于只读(或大部分)数据.对可以并发更改的表数据实施相同的操作更具挑战性.您需要从准确地定义所需的行为开始.

However, this approach is only sensible for read-only (or mostly) data. Implementing the same for table data that can change concurrently is more challenging. You need to start by defining desired behavior exactly.

我建议采用不同的分页方法:

SELECT *
FROM   big_table
WHERE  (vote, id) > (vote_x, id_x)  -- ROW values
ORDER  BY vote, id  -- needs to be deterministic
LIMIT  n;

其中vote_xid_x 来自上一页最后(对于 DESCASC).或者从 first 如果导航向后.

Where vote_x and id_x are from the last row of the previous page (for both DESC and ASC). Or from the first if navigating backwards.

您已有的索引支持比较行值 - 该功能符合 ISO SQL 标准,但并非每个 RDBMS 都支持它.

Comparing row values is supported by the index you already have - a feature that complies with the ISO SQL standard, but not every RDBMS supports it.

CREATE INDEX vote_order_asc ON big_table (vote, id);

或降序:

SELECT *
FROM   big_table
WHERE  (vote, id) < (vote_x, id_x)  -- ROW values
ORDER  BY vote DESC, id DESC
LIMIT  n;

可以使用相同的索引.
我建议您声明您的列 NOT NULL 或熟悉 NULLS FIRST|LAST 构造:

Can use the same index.
I suggest you declare your columns NOT NULL or acquaint yourself with the NULLS FIRST|LAST construct:

请特别注意两件事:

  1. WHERE 子句中的 ROW 值不能用分隔的成员字段替换.WHERE (vote, id) >(vote_x, id_x) 不能替换为:

  1. The ROW values in the WHERE clause cannot be replaced with separated member fields. WHERE (vote, id) > (vote_x, id_x) cannot be replaced with:

WHERE  vote >= vote_x
AND    id   > id_x

这将排除所有具有id <= id_x的行,而我们只想为同一次投票而不是为下一次投票.正确的翻译是:

That would rule out all rows with id <= id_x, while we only want to do that for the same vote and not for the next. The correct translation would be:

WHERE (vote = vote_x AND id > id_x) OR vote > vote_x

...它不能很好地与索引一起使用,并且对于更多的列变得越来越复杂.

... which doesn't play along with indexes as nicely, and gets increasingly complicated for more columns.

对于单个列来说很简单,显然.这就是我开头提到的特例.

Would be simple for a single column, obviously. That's the special case I mentioned at the outset.

该技术不适用于 ORDER BY 中的混合方向,例如:

The technique does not work for mixed directions in ORDER BY like:

ORDER  BY vote ASC, id DESC

至少我想不出一种通用方法来有效地实现这一点.如果两列中至少有一列是数字类型,则可以在 (vote, (id * -1)) 上使用带有倒排值的函数索引 - 并在 中使用相同的表达式订购方式:

At least I can't think of a generic way to implement this as efficiently. If at least one of both columns is a numeric type, you could use a functional index with an inverted value on (vote, (id * -1)) - and use the same expression in ORDER BY:

ORDER  BY vote ASC, (id * -1) ASC

相关:

特别注意 Markus Winand 的演讲,我链接到:

Note in particular the presentation by Markus Winand I linked to:

这篇关于在大表上使用 OFFSET 优化查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆