使用大表上的OFFSET优化查询 [英] Optimize query with OFFSET on large table

查看:107
本文介绍了使用大表上的OFFSET优化查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有桌子

create table big_table (
id serial primary key,
-- other columns here
vote int
); 

此表非常大,大约有7,000万行,我需要查询:

This table is very big, approximately 70 million rows, I need to query:

SELECT * FROM big_table
ORDER BY vote [ASC|DESC], id [ASC|DESC]
OFFSET x LIMIT n  -- I need this for pagination

您可能知道,当x很大时,这样的查询非常慢.

As you may know, when x is a large number, queries like this are very slow.

为了优化性能,我添加了索引:

For performance optimization I added indexes:

create index vote_order_asc on big_table (vote asc, id asc);

create index vote_order_desc on big_table (vote desc, id desc);

EXPLAIN表明上面的SELECT查询使用这些索引,但是无论如何,偏移很大都会很慢.

EXPLAIN shows that the above SELECT query uses these indexes, but it's very slow anyway with a large offset.

我该怎么做才能在大表中使用OFFSET优化查询?也许PostgreSQL 9.5或更高版本具有某些功能?我已经搜索了,但是没有找到任何东西.

What can I do to optimize queries with OFFSET in big tables? Maybe PostgreSQL 9.5 or even newer versions have some features? I've searched but didn't find anything.

推荐答案

大型OFFSET总是很慢. Postgres必须对所有行进行排序,并计算 visible 直到您的偏移量为止.要直接 跳过所有先前的行,可以向表中添加索引的row_number(或创建

A large OFFSET is always going to be slow. Postgres has to order all rows and count the visible ones up to your offset. To skip all previous rows directly you could add an indexed row_number to the table (or create a MATERIALIZED VIEW including said row_number) and work with WHERE row_number > x instead of OFFSET x.

但是,这种方法仅对只读(或大部分)数据有意义.为可以同时更改 的表数据实现相同的操作更具挑战性.您需要先<精确地定义所需的行为.

However, this approach is only sensible for read-only (or mostly) data. Implementing the same for table data that can change concurrently is more challenging. You need to start by defining desired behavior exactly.

我建议使用不同的方法分页:

I suggest a different approach for pagination:

SELECT *
FROM   big_table
WHERE  (vote, id) > (vote_x, id_x)  -- ROW values
ORDER  BY vote, id  -- needs to be deterministic
LIMIT  n;

vote_xid_x来自 上一页 最后行(对于DESC).或从 first (如果向后导航 ).

Where vote_x and id_x are from the last row of the previous page (for both DESC and ASC). Or from the first if navigating backwards.

已有的索引支持比较行值-一种符合ISO SQL标准的功能,但并非每个RDBMS都支持它.

Comparing row values is supported by the index you already have - a feature that complies with the ISO SQL standard, but not every RDBMS supports it.

CREATE INDEX vote_order_asc ON big_table (vote, id);

或按降序排列:

SELECT *
FROM   big_table
WHERE  (vote, id) < (vote_x, id_x)  -- ROW values
ORDER  BY vote DESC, id DESC
LIMIT  n;

可以使用相同的索引.
我建议您声明列NOT NULL或熟悉NULLS FIRST|LAST构造:

Can use the same index.
I suggest you declare your columns NOT NULL or acquaint yourself with the NULLS FIRST|LAST construct:

请注意两件事:

  1. WHERE子句中的ROW值不能用分隔的成员字段替换. WHERE (vote, id) > (vote_x, id_x) 不能 替换为:

  1. The ROW values in the WHERE clause cannot be replaced with separated member fields. WHERE (vote, id) > (vote_x, id_x) cannot be replaced with:

WHERE  vote >= vote_x
AND    id   > id_x

这将排除所有行和id <= id_x行,而我们只希望对同一票进行表决,而不是对下一票进行表决.正确的翻译应该是:

That would rule out all rows with id <= id_x, while we only want to do that for the same vote and not for the next. The correct translation would be:

WHERE (vote = vote_x AND id > id_x) OR vote > vote_x

...不能很好地与索引配合使用,并且随着列的增加而变得越来越复杂.

... which doesn't play along with indexes as nicely, and gets increasingly complicated for more columns.

很简单.那是我一开始提到的特殊情况.

Would be simple for a single column, obviously. That's the special case I mentioned at the outset.

该技术不适用于ORDER BY中的混合方向,例如:

The technique does not work for mixed directions in ORDER BY like:

ORDER  BY vote ASC, id DESC

至少我想不出一种通用的方法来有效地实现这一目标.如果两列中至少有一列是数字类型,则可以在(vote, (id * -1))上使用带倒数值的函数索引,并在ORDER BY中使用相同的表达式:

At least I can't think of a generic way to implement this as efficiently. If at least one of both columns is a numeric type, you could use a functional index with an inverted value on (vote, (id * -1)) - and use the same expression in ORDER BY:

ORDER  BY vote ASC, (id * -1) ASC

相关:

  • SQL syntax term for 'WHERE (col1, col2) < (val1, val2)'
  • Improve performance for order by with columns from many tables

尤其要注意Markus Winand的演讲,我链接到:

Note in particular the presentation by Markus Winand I linked to:

这篇关于使用大表上的OFFSET优化查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆