优化大型表最近行查询的性能 [英] Optimize performance for queries on recent rows of a large table
问题描述
我有一张大桌子:
CREATE TABLE "orders" (
"id" serial NOT NULL,
"person_id" int4,
"created" int4,
CONSTRAINT "orders_pkey" PRIMARY KEY ("id")
);
所有请求中的90%与最近2-3天的<$ c $订单有关c> person_id ,例如:
90% of all requests are about orders from the last 2-3 days by a person_id
, like:
select * from orders
where person_id = 1
and created >= extract(epoch from current_timestamp)::int - 60 * 60 * 24 * 3;
如何提高效果?
我知道分区,但是现有行呢?看来我需要每2-3天手动创建 INHERITS
表。
I know about Partitioning, but what about existing rows? And it looks like I need to create INHERITS
tables manually every 2-3 days.
推荐答案
A 部分多列在(创建的人_id)
上具有伪< IMMUTABLE
条件的索引 会有所帮助(很多)。需要不时地重新创建以保持性能。
A partial, multicolumn index on (person_id, created)
with a pseudo-IMMUTABLE
condition would help (a lot). Needs to be recreated from time to time to keep performance up.
注意,如果表不是很大,则可以在很大程度上简化并使用普通的多列索引。
或考虑在表分区 Postgres 12或更高版本(功能最终成熟)。
Note, if your table is not very big, you can largely simplify and use a plain multicolumn index.
Or consider table partitioning in Postgres 12 or later (where the feature finally matured).
原始的函数提供恒定的时间点3或再过几天(在您的情况下,以Unix纪元表示):
A primitive function provides a constant point in time, 3 or more days back (represented by a unix epoch in your case):
CREATE OR REPLACE FUNCTION f_orders_idx_start()
RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE COST 1 AS
'SELECT 1387497600';
并行安全
仅适用于Postgres 10或更高版本。
1387497600
是以下结果的结果:
PARALLEL SAFE
only for Postgres 10 or later.
1387497600
being the result of:
SELECT extract(epoch from now())::integer - 259200;
-- 259200 being the result of 60 * 60 * 24 * 3
<在此伪 IMMUTABLE
条件下,strong>局部索引:
Base your partial index on this pseudo-IMMUTABLE
condition:
CREATE INDEX orders_created_recent_idx ON orders (person_id, created)
WHERE created >= f_orders_idx_start();
在相同条件下基于查询:
SELECT *
FROM orders
WHERE person_id = 1
AND created >= f_orders_idx_start() -- match partial idx condition
AND created >= extract(epoch from now())::integer - 259200; -- actual condition
行并创建> = f_orders_idx_start()
似乎是多余的,但有助于说服Postgres使用部分索引。
The line AND created >= f_orders_idx_start()
seems redundant, but is instrumental to convince Postgres to use the partial index.
一个函数来重新创建函数和索引时。可能每天晚上都有cron作业:
A function to recreate function and index from time to time. Possibly with a cron-job every night:
CREATE OR REPLACE FUNCTION f_orders_reindex_partial()
RETURNS void AS
$func$
DECLARE
-- 3 days back, starting at 00:00
_start int := extract(epoch from now()::date -3)::int;
BEGIN
IF _start = f_orders_idx_start() THEN
-- do nothing, nothing changes.
ELSE
DROP INDEX IF EXISTS orders_created_recent_idx;
-- Recreate IMMUTABLE function
EXECUTE format('
CREATE OR REPLACE FUNCTION f_orders_idx_start()
RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE COST 1 AS
$$SELECT %s $$'
, _start
);
-- Recreate partial index
CREATE INDEX orders_created_recent_idx ON orders (person_id, created)
WHERE created >= f_orders_idx_start();
END IF;
END
$func$ LANGUAGE plpgsql;
然后,重新建立索引,调用(理想情况下并发负载很少或没有):
Then, to rebase your index, call (ideally with little or no concurrent load):
SELECT f_orders_reindex_partial(); -- that's all
如果由于并发负载而无法删除和重新创建索引,请考虑重新索引
在Postgres 12或更高版本中。简直太简单了:
If you cannot afford dropping and recreating the index due to concurrent load, consider REINDEX CONCURRENTLY
in Postgres 12 or later. It's dead simple:
REINDEX INDEX orders_created_recent_idx;
即使您从未调用此函数,所有查询仍将继续起作用。随着部分索引的增长,性能会随着时间的推移而缓慢下降。
All queries continue to work, even if you never call this function. Performance slowly deteriorates over time with the growing partial index.
我正在成功地使用这种机制,并具有几个大表和类似的要求。 非常快。
I am using this regime successfully with a couple of big tables and similar requirements. Very fast.
对于Postgres 9.2或更高版本,并且如果您的表只有很少的小列,并且该表的写入量不大,则可能需要确保 覆盖索引 :
For Postgres 9.2 or later, and if your table has only few, small columns, and if the table is not heavily written, it might pay to make that a covering index:
CREATE INDEX orders_created_recent_idx ON orders (person_id, created, id)
WHERE created >= f_orders_idx_start();
,您可能要使用 INCLUDE
代替:
In Postgres 11 or later, you might want to use INCLUDE
instead:
CREATE INDEX orders_created_recent_idx ON orders (person_id, created) INCLUDE (id)
WHERE created >= f_orders_idx_start();
这篇关于优化大型表最近行查询的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!