优化大型表最近行查询的性能 [英] Optimize performance for queries on recent rows of a large table

查看:102
本文介绍了优化大型表最近行查询的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张大桌子:

CREATE TABLE "orders" (
"id" serial NOT NULL,
"person_id" int4,
"created" int4,
CONSTRAINT "orders_pkey" PRIMARY KEY ("id")
);

所有请求中的90%与最近2-3天的<$ c $订单有关c> person_id ,例如:

90% of all requests are about orders from the last 2-3 days by a person_id, like:

select * from orders
where person_id = 1
and created >= extract(epoch from current_timestamp)::int - 60 * 60 * 24 * 3;

如何提高效果?

我知道分区,但是现有行呢?看来我需要每2-3天手动创建 INHERITS 表。

I know about Partitioning, but what about existing rows? And it looks like I need to create INHERITS tables manually every 2-3 days.

推荐答案

A 部分多列在(创建的人_id)上具有伪< IMMUTABLE 条件的索引 会有所帮助(很多)。需要不时地重新创建以保持性能。

A partial, multicolumn index on (person_id, created) with a pseudo-IMMUTABLE condition would help (a lot). Needs to be recreated from time to time to keep performance up.

注意,如果表不是很大,则可以在很大程度上简化并使用普通的多列索引。

或考虑在表分区 Postgres 12或更高版本(功能最终成熟)。

Note, if your table is not very big, you can largely simplify and use a plain multicolumn index.
Or consider table partitioning in Postgres 12 or later (where the feature finally matured).

原始的函数提供恒定的时间点3或再过几天(在您的情况下,以Unix纪元表示):

A primitive function provides a constant point in time, 3 or more days back (represented by a unix epoch in your case):

CREATE OR REPLACE FUNCTION f_orders_idx_start()
  RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE COST 1 AS
'SELECT 1387497600';

并行安全仅适用于Postgres 10或更高版本。

1387497600 是以下结果的结果:

PARALLEL SAFE only for Postgres 10 or later.
1387497600 being the result of:

SELECT extract(epoch from now())::integer - 259200;
-- 259200 being the result of 60 * 60 * 24 * 3

<在此伪 IMMUTABLE 条件下,strong>局部索引:

Base your partial index on this pseudo-IMMUTABLE condition:

CREATE INDEX orders_created_recent_idx ON orders (person_id, created)
WHERE created >= f_orders_idx_start();

在相同条件下基于查询

SELECT *
FROM   orders
WHERE  person_id = 1
AND    created >= f_orders_idx_start()  -- match partial idx condition
AND    created >= extract(epoch from now())::integer - 259200;  -- actual condition

并创建> = f_orders_idx_start()似乎是多余的,但有助于说服Postgres使用部分索引。

The line AND created >= f_orders_idx_start() seems redundant, but is instrumental to convince Postgres to use the partial index.

一个函数来重新创建函数和索引时。可能每天晚上都有cron作业:

A function to recreate function and index from time to time. Possibly with a cron-job every night:

CREATE OR REPLACE FUNCTION f_orders_reindex_partial()
  RETURNS void AS
$func$
DECLARE
   -- 3 days back, starting at 00:00
   _start int := extract(epoch from now()::date -3)::int;
BEGIN       
   IF _start = f_orders_idx_start() THEN
      -- do nothing, nothing changes.
   ELSE
      DROP INDEX IF EXISTS orders_created_recent_idx;
      -- Recreate IMMUTABLE function
      EXECUTE format('
         CREATE OR REPLACE FUNCTION f_orders_idx_start()
           RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE COST 1 AS
         $$SELECT %s $$'
       , _start
      );
      -- Recreate partial index
      CREATE INDEX orders_created_recent_idx ON orders (person_id, created)
      WHERE created >= f_orders_idx_start();
   END IF;    
END
$func$  LANGUAGE plpgsql;

然后,重新建立索引,调用(理想情况下并发负载很少或没有):

Then, to rebase your index, call (ideally with little or no concurrent load):

SELECT f_orders_reindex_partial();  -- that's all

如果由于并发负载而无法删除和重新创建索引,请考虑重新索引在Postgres 12或更高版本中。简直太简单了:

If you cannot afford dropping and recreating the index due to concurrent load, consider REINDEX CONCURRENTLY in Postgres 12 or later. It's dead simple:

REINDEX INDEX orders_created_recent_idx;

即使您从未调用此函数,所有查询仍将继续起作用。随着部分索引的增长,性能会随着时间的推移而缓慢下降。

All queries continue to work, even if you never call this function. Performance slowly deteriorates over time with the growing partial index.

我正在成功地使用这种机制,并具有几个大表和类似的要求。 非常快。

I am using this regime successfully with a couple of big tables and similar requirements. Very fast.

对于Postgres 9.2或更高版本,并且如果您的表只有很少的小列,并且该表的写入量不大,则可能需要确保 覆盖索引

For Postgres 9.2 or later, and if your table has only few, small columns, and if the table is not heavily written, it might pay to make that a covering index:

CREATE INDEX orders_created_recent_idx ON orders (person_id, created, id)
WHERE created >= f_orders_idx_start();

,您可能要使用 INCLUDE 代替:

In Postgres 11 or later, you might want to use INCLUDE instead:

CREATE INDEX orders_created_recent_idx ON orders (person_id, created) INCLUDE (id)
WHERE created >= f_orders_idx_start();

这篇关于优化大型表最近行查询的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆