使用&&加入TSTZRANGE时,PostgreSQL 9.4查询的速度逐渐变慢. [英] Postgresql 9.4 query gets progressively slower when joining TSTZRANGE with &&

查看:65
本文介绍了使用&&加入TSTZRANGE时,PostgreSQL 9.4查询的速度逐渐变慢.的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在运行一个查询,该查询随着添加记录而逐渐变慢.通过自动过程(bash调用psql)连续添加记录.我想纠正这个瓶颈;但是,我不知道我最好的选择是什么.

I am running a query that gets progressively slower as records are added. Records are added continuously via an automated process (bash calling psql). I would like to correct this bottle neck; however, I don't know what my best option is.

这是pgBadger的输出:

This is the output from pgBadger:

Hour    Count   Duration    Avg duration
00      9,990   10m3s       60ms     <---ignore this hour
02      1       60ms        60ms     <---ignore this hour
03      4,638   1m54s       24ms     <---queries begin with table empty
04      30,991  55m49s      108ms    <---first full hour of queries running
05      13,497  58m3s       258ms
06      9,904   58m32s      354ms
07      10,542  58m25s      332ms
08      8,599   58m42s      409ms
09      7,360   58m52s      479ms
10      6,661   58m57s      531ms
11      6,133   59m2s       577ms
12      5,601   59m6s       633ms
13      5,327   59m9s       666ms
14      4,964   59m12s      715ms
15      4,759   59m14s      746ms
16      4,531   59m17s      785ms
17      4,330   59m18s      821ms
18      939     13m16s      848ms

表结构如下:

CREATE TABLE "Parent" (
    "ParentID" SERIAL PRIMARY KEY,
    "Details1" VARCHAR
);

父母" 与表"Foo" 具有一对多关系:

Table "Parent" has a one to many relationship with table "Foo":

CREATE TABLE "Foo" (
    "FooID" SERIAL PRIMARY KEY,
    "ParentID" int4 NOT NULL REFERENCES "Parent" ("ParentID"),
    "Details1" VARCHAR
);

"Foo" 与表"Bar" 的一对多关系:

CREATE TABLE "Bar" (
    "FooID" int8 NOT NULL REFERENCES "Foo" ("FooID"),
    "Timerange" tstzrange NOT NULL,
    "Detail1" VARCHAR,
    "Detail2" VARCHAR,
    CONSTRAINT "Bar_pkey" PRIMARY KEY ("FooID", "Timerange")
);
CREATE INDEX  "Bar_FooID_Timerange_idx" ON "Bar" USING gist("FooID", "Timerange");

此外,表"Bar" 对于相同的"FooID" 可能不包含重叠的"Timespan" 值>"ParentID" ..我创建了一个触发器,该触发器会在执行任何 INSERT UPDATE DELETE 后触发防止范围重叠.

Additionally, table "Bar" may not contain overlapping "Timespan" values for the same "FooID" or "ParentID". I have created a trigger that fires after any INSERT, UPDATE, or DELETE that prevents overlapping ranges.

触发包括一个部分,该部分与该页面相似:

The trigger includes a section that look similar to this:

WITH
    "cte" AS (
        SELECT
            "Foo"."FooID",
            "Foo"."ParentID",
            "Foo"."Details1",
            "Bar"."Timespan"
        FROM
            "Foo"
            JOIN "Bar" ON "Foo"."FooID" = "Bar"."FooID"
        WHERE
            "Foo"."FooID" = 1234
    )
SELECT
    "Foo"."FooID",
    "Foo"."ParentID",
    "Foo"."Details1",
    "Bar"."Timespan"
FROM
    "cte"
    JOIN "Foo" ON 
        "cte"."ParentID" = "Foo"."ParentID"
        AND "cte"."FooID" <> "Foo"."FooID"
    JOIN "Bar" ON
        "Foo"."FooID" = "Bar"."FooID"
        AND "cte"."Timespan" && "Bar"."Timespan";

EXPLAIN ANALYSE 的结果:

Nested Loop  (cost=7258.08..15540.26 rows=1 width=130) (actual time=8.052..147.792 rows=1 loops=1)
  Join Filter: ((cte."FooID" <> "Foo"."FooID") AND (cte."ParentID" = "Foo"."ParentID"))
  Rows Removed by Join Filter: 76
  CTE cte
    ->  Nested Loop  (cost=0.68..7257.25 rows=1000 width=160) (actual time=1.727..1.735 rows=1 loops=1)
          ->  Function Scan on "fn_Bar"  (cost=0.25..10.25 rows=1000 width=104) (actual time=1.699..1.701 rows=1 loops=1)
          ->  Index Scan using "Foo_pkey" on "Foo" "Foo_1"  (cost=0.42..7.24 rows=1 width=64) (actual time=0.023..0.025 rows=1 loops=1)
                Index Cond: ("FooID" = "fn_Bar"."FooID")
  ->  Nested Loop  (cost=0.41..8256.00 rows=50 width=86) (actual time=1.828..147.188 rows=77 loops=1)
        ->  CTE Scan on cte  (cost=0.00..20.00 rows=1000 width=108) (actual time=1.730..1.740 rows=1 loops=1)
   **** ->  Index Scan using "Bar_FooID_Timerange_idx" on "Bar"  (cost=0.41..8.23 rows=1 width=74) (actual time=0.093..145.314 rows=77 loops=1)
              Index Cond: ((cte."Timespan" && "Timespan"))
  ->  Index Scan using "Foo_pkey" on "Foo"  (cost=0.42..0.53 rows=1 width=64) (actual time=0.004..0.005 rows=1 loops=77)
        Index Cond: ("FooID" = "Bar"."FooID")
Planning time: 1.490 ms
Execution time: 147.869 ms

(****是我的重点)

(**** emphasis mine)

这似乎表明99%的工作都在 JOIN 中进行,从"cte" "Bar" (通过"Foo" )...,但是它已经在使用适当的索引了……它仍然太慢了.

This seems to show that 99% of the work being done is in the JOIN from "cte" to "Bar" (via "Foo") ... but it is already using the appropriate index... it's still just too slow.

所以我跑了:

SELECT 
    pg_size_pretty(pg_relation_size('"Bar"')) AS "Table",
    pg_size_pretty(pg_relation_size('"Bar_FooID_Timerange_idx"')) AS "Index";

结果:

    Table    |    Index
-------------|-------------
 283 MB      | 90 MB

这个大小的索引(相对于表格)是否在读取性能方面提供了很多?我正在考虑一个sudo分区,其中的索引被几个部分索引所代替...也许部分将不需要维护(和读取),并且性能将得到改善.我从未见过这样做,只是一个想法.如果这是一个选择,那么我想不出任何好的方法来限制段,因为这将是在 TSTZRANGE 值上.

Does an index of this size (relative to the table) offer much in terms of read performance? I was considering a sudo-partition where the index is replaced with several partial indexes... maybe the partials would have less to maintain (and read) and performance would improve. I have never seen this done, just an idea. If this is an option, I can't think of any good way to limit the segments given this would be on a TSTZRANGE value.

我还认为将"ParentID" 添加到"Bar" 可以加快速度,但是我不想对它进行非规范化.

I also think adding the "ParentID" to "Bar" would speed things up, but I don't want to denormalize.

我还有什么选择?

在最高性能(18:00时)处,该过程从每秒1.15条记录持续增加每秒 14.5条记录....

At the peak performance (hour 18:00), the process was adding 14.5 records per second consistently... up from 1.15 records per second.

这是由于以下原因造成的:

  1. "ParentID" 添加到表"Bar"
  2. "Foo"("ParentID","FooID")添加外键约束
  3. 添加要使用的gist("ParentID"与=,时间范围"与&&&&&")最初要延迟 (已安装btree_gist模块)
  1. Adding "ParentID" to table "Bar"
  2. Adding a foreign key constraint to "Foo" ("ParentID", "FooID")
  3. Adding EXCLUDE USING gist ("ParentID" WITH =, "Timerange" WITH &&) DEFERRABLE INITIALLY DEFERRED (btree_gist module already installed)

推荐答案

排除约束

此外,表"Bar" 可能不包含重叠的"Timespan" 相同的"FooID" "ParentID" 的值.我创建了一个触发器在任何 INSERT UPDATE DELETE 阻止后触发范围重叠.

Additionally, table "Bar" may not contain overlapping "Timespan" values for the same "FooID" or "ParentID". I have created a trigger that fires after any INSERT, UPDATE, or DELETE that prevents overlapping ranges.

我建议您改用排除约束,它更简单,更安全,更快捷:

I suggest you use an exclusion constraint instead, which is much simpler, safer and faster:

您需要安装其他模块 btree_gist 首先.请参阅此相关答案中的说明和解释:

You need to install the additional module btree_gist first. See instructions and explanation in this related answer:

并且您需要在表"Bar" 中多余地包含"ParentID" ,这将是一个很小的代价.表定义可能看起来像这样:

And you need to include "ParentID" in the table "Bar" redundantly, which will be a small price to pay. Table definitions could look like this:

CREATE TABLE "Foo" (
   "FooID"    serial PRIMARY KEY
   "ParentID" int4 NOT NULL REFERENCES "Parent"
   "Details1" varchar
   CONSTRAINT foo_parent_foo_uni UNIQUE ("ParentID", "FooID")  -- required for FK
);

CREATE TABLE "Bar" (
   "ParentID"  int4 NOT NULL,
   "FooID"     int4 NOT NULL REFERENCES "Foo" ("FooID"),
   "Timerange" tstzrange NOT NULL,
   "Detail1"   varchar,
   "Detail2"   varchar,
   CONSTRAINT "Bar_pkey" PRIMARY KEY ("FooID", "Timerange"),
   CONSTRAINT bar_foo_fk
      FOREIGN KEY ("ParentID", "FooID") REFERENCES "Foo" ("ParentID", "FooID"),
   CONSTRAINT bar_parent_timerange_excl
      EXCLUDE USING gist ("ParentID" WITH =, "Timerange" WITH &&)
);

我还将"Bar"."FooID" 的数据类型从 int8 更改为 int4 .它引用"Foo"."FooID" ,它是一个序列,即 int4 .使用匹配类型 int4 (或仅使用 integer )有多种原因,其中之一就是性能.

I also changed the data type for "Bar"."FooID" from int8 to int4. It references "Foo"."FooID", which is a serial, i.e. int4. Use the matching type int4 (or just integer) for several reasons, one of them being performance.

您不再需要触发器(至少不需要执行此任务),也不再创建索引 "Bar_FooID_Timerange_idx" ,因为它是由排除约束隐式创建的.

You don't need a trigger any more (at least not for this task), and you don't create the index "Bar_FooID_Timerange_idx" any more, since it's created implicitly by the exclusion constraint.

("ParentID","FooID")上的btree索引最有可能是有用的,

A btree index on ("ParentID", "FooID") will most probably be useful, though:

CREATE INDEX bar_parentid_fooid_idx ON "Bar" ("ParentID", "FooID");

相关:

我选择了 UNIQUE("ParentID","FooID"),而不是出于某种原因,因为存在另一个索引,该索引中的前导"FooID" 任一表:

I chose UNIQUE ("ParentID", "FooID") and not the other way round for a reason, since there is another index with leading "FooID" in either table:

在旁边:我从不使用双引号的CaMeL-case标识符在Postgres中.我只在这里这样做以符合您的布局.

Aside: I never use double-quoted CaMeL-case identifiers in Postgres. I only do it here to comply with your layout.

如果您不能或将不会多余地包含"Bar"."ParentID" ,则还有另一种 rogue 方式-在"Foo"的情况下."ParentID" 从未更新.确保这一点,例如使用触发器.

If you cannot or will not include "Bar"."ParentID" redundantly, there is another rogue way - on the condition that "Foo"."ParentID" is never updated. Make sure of that, with a trigger for instance.

您可以伪造 IMMUTABLE 函数:

CREATE OR REPLACE FUNCTION f_parent_of_foo(int)
  RETURNS int AS
'SELECT "ParentID" FROM public."Foo" WHERE "FooID" = $1'
  LANGUAGE sql IMMUTABLE;

我假设使用 public 来对表名进行模式限定,以确保表名正确.适应您的架构.

I schema-qualified the table name to make sure, assuming public. Adapt to your schema.

更多:

然后在排除约束中使用它:

Then use it in the exclusion constraint:

   CONSTRAINT bar_parent_timerange_excl
      EXCLUDE USING gist (f_parent_of_foo("FooID") WITH =, "Timerange" WITH &&)

在保存一个冗余的 int4 列时,约束的验证费用将更高,并且整个解决方案取决于更多的前提条件.

While saving one redundant int4 column, the constraint will be more expensive to verify and the whole solution depends on more preconditions.

您可以将 INSERT UPDATE 包装到plpgsql函数中,并从排除约束( 23P01 exclusion_violation )中捕获可能的异常,以对其进行处理方式.

You could wrap INSERT and UPDATE into a plpgsql function and trap possible exceptions from the exclusion constraint (23P01 exclusion_violation) to handle it some way.

INSERT ...

EXCEPTION
    WHEN exclusion_violation
    THEN  -- handle conflict

完整的代码示例:

在Postgres 9.5 中,您可以使用新的"UPSERT"实现直接处理 INSERT .文档:

In Postgres 9.5 you can handle INSERT directly with the new "UPSERT" implementation. The documentation:

可选的 ON CONFLICT 子句为引发唯一违规或排除约束违规错误.对于建议插入的每个单独行,要么插入继续,或者,如果仲裁者约束或索引由违反了 conflict_target ,替代的 conflict_action 是采取.在冲突中不做只是避免插入一行作为其行替代行动. ON CONFLICT DO UPDATE 更新现有行与建议作为插入动作插入的行相冲突.

The optional ON CONFLICT clause specifies an alternative action to raising a unique violation or exclusion constraint violation error. For each individual row proposed for insertion, either the insertion proceeds, or, if an arbiter constraint or index specified by conflict_target is violated, the alternative conflict_action is taken. ON CONFLICT DO NOTHING simply avoids inserting a row as its alternative action. ON CONFLICT DO UPDATE updates the existing row that conflicts with the row proposed for insertion as its alternative action.

但是:

请注意, ON CONFLICT DO UPDATE 不支持排除约束.

但是您仍然可以使用 ON CONFLICT DO NOTHING ,这样可以避免可能的 exclusion_violation 异常.只需检查是否实际更新过任何行,这比较便宜:

But you can still use ON CONFLICT DO NOTHING, thus avoiding possible exclusion_violation exceptions. Just check whether any rows were actually updated, which is cheaper:

INSERT ... 
ON CONFLICT ON CONSTRAINT bar_parent_timerange_excl DO NOTHING;

IF NOT FOUND THEN
   -- handle conflict
END IF;

此示例将检查限制为给定的排除约束.(为此,我在上面的表定义中为此明确指定了约束.)未捕获其他可能的异常.

This example restricts the check to the given exclusion constraint. (I named the constraint explicitly for this purpose in the table definition above.) Other possible exceptions are not caught.

这篇关于使用&amp;&amp;加入TSTZRANGE时,PostgreSQL 9.4查询的速度逐渐变慢.的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆