Postgres分层(jsonb)CTE不必要地变慢 [英] Postgres hierarchical (jsonb) CTE unnecessarily slow

查看:84
本文介绍了Postgres分层(jsonb)CTE不必要地变慢的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的表中有一个JsonB列,其中包含分层信息。

  MyTable(id uuid,索引jsonb,内容bytea )

现在,如果我创建CTE,则说

 具有递归层次结构(pid,id,content)AS(
-加载第一个父项
SELECT t.indexes->>'parentId'作为pId ,t.id,t.content来自MyTable c
加入MyTable t在t.indexes->>'Id'= c.indexes->>'parentId'
WHERE c.Id = ANY('{.. Some UUIDS}')
UNION
选择t.indexes->>'parentId'作为pId,t.id,t.content
来自层次结构h ,MyTable t
WHERE t.indexes->>'Id'= h.pid
)SELECT ID,来自层次结构


现在从300K条记录的表中的2个节点构建父树的示例过程大约需要10秒钟。



现在,如果我创建索引

 创建索引MyTable上的MyIndex 
使用btree
(((索引-> ‘Id’)

这将时间缩短至4.5s。
这将产生对

 ->的分析。递归联合(cost = 23.81..4528423.71行= 80794929宽度= 1219)(实际时间= 0.188..1802.636行= 5循环= 1)
->嵌套循环(cost = 23.81..3150.15行= 899宽度= 1219)(实际时间= 0.132..0.133行= 1循环= 1)
输出:(t.indexes->>'parentId': :text),t.id,t.content
->使用 TEST上的 MyTable_pkey进行索引扫描。 MyTable c(成本= 0.42..8.44行= 1宽度= 123)(实际时间= 0.053..0.053行= 1循环= 1)
输出: c.id,c.content,c.indexes
索引条件:(c.id = ANY('{1c725f08-0324-41e9-b417-5ec885fb1cc9}':: uuid []))
- >在 TEST。 MyTable t上进行位图堆扫描(成本= 23.39..3130.48行= 899宽度= 1219)(实际时间= 0.066..0.066行= 1循环= 1)
输出:t.id ,t.content,t.indexes
重新检查条件:((((t.indexes->>'Id':: text)=(c.indexes->>'parentId':: text) ))
堆块:精确= 1
->对 MyIndex进行位图索引扫描(成本= 0.00..23.17行= 899宽度= 0)(实际时间= 0.055..0.055行= 1循环= 1)
Index Cond:(((t.indexes-> ;>'Id':: text)=(c.indexes->>'parentId':: text))

//联合部分
->合并联接(成本= 770.60..290937.50行= 8079403宽度= 1219)(实际时间= 360.467..360.476行= 1循环= 5)
输出:(t_1.indexes->>'parentId': :text),t_1.id,t_1.content
合并条件:((t_1.indexes->>'Id':: text)= h.pid)
->使用 TEST上的 MyIndex进行索引扫描。 MyTable t_1(成本= 0.42..127680.55行= 179742宽度= 1219)(实际时间= 0.019..288.168行= 60478循环= 5)
输出: t_1.id,t_1.sourceid,t_1.content,t_1.indexes
->排序(cost = 770.18..792.65行= 8990宽度= 32)(实际时间= 0.010..0.011行= 1循环= 5)
输出:h.pid
排序键:h.pid
排序方法:quicksort内存:25kB
->在层次结构h上进行工作表扫描(成本= 0.00..179.80行= 8990宽度= 32)(实际时间= 0.001..0.001行= 1循环= 5)
输出:h.pid

现在,我可以通过在cte中用函数替换索引->>'parentId',并在

 创建函数 TEST。 MyFunction(idarg uuid)
以$ b $返回文本b $ BODY $
选择t.indexes-> Id是来自 TEST的结果。 MyTable t
t.id = idarg
$ BODY $
LANGUAGE sql IMMUTABLE;

有索引

 在MyTable 
上创建索引MyFunctionIndex使用btree
(MyFunction(id))

现在需要0.01s来执行带有分析的查询

 ->递归联合(cost = 23.81..5333205.06行= 80794929宽度= 1219)(实际时间= 0.163..0.291行= 5循环= 1)嵌套循环(cost = 23.81..3372.65行= 899宽度= 1219)(实际时间= 0.082..0.084行= 1循环= 1)
输出:(t.indexes->>'parentId': :text),t.id,t.content,t.modified
->在 TEST上使用 MyTable_pkey进行索引扫描。 MyTable c(成本= 0.42..8.44行= 1宽度= 123)(实际时间= 0.019..0.019行= 1循环= 1)
输出: c.id,c.sourceid,c.viewid,c.content,c.indexes,c.statekey,c.modified
Index Cond:(c.id = ANY('{1c725f08-0324-41e9-b417 -5ec885fb1cc9}':: uuid []))
->在 TEST。 MyTable t上进行位图堆扫描(成本= 23.39..3352.98行= 899宽度= 1219)(实际时间= 0.037..0.037行= 1循环= 1)
输出:t.id ,t.content,t.indexes
重新检查条件:(( TEST。 MyFunction(t.id)=(c.indexes->>'parentId':: text)))
堆块:精确= 1
->对 MyFunctionIndex进行位图索引扫描(成本= 0.00..23.17行= 899宽度= 0)(实际时间= 0.025..0.025行= 1循环= 1)
索引条件:( TEST。 MyFunction (t.id)=(c.indexes->>'parentId':: text))

// UNION PART
->嵌套循环(成本= 0.42..371393.38行= 8079403宽度= 1219)(实际时间= 0.012..0.013行= 1循环= 5)
输出:(t_1.indexes->>'parentId': :text),t_1.id,t_1.content
->在层次结构h上进行工作表扫描(成本= 0.00..179.80行= 8990宽度= 32)(实际时间= 0.000..0.000行= 1循环= 5)
输出:h.pid,h.id,h。内容
->使用 TEST上的 MyFunctionIndex进行索引扫描。 MyTable t_1(成本= 0.42..30.06行= 899宽度= 1219)(实际时间= 0.010..0.010行= 1循环= 5)
输出: t_1.id,t_1.content,t_1.indexes
索引条件:( TEST。 MyFunction(t_1.id)= h.pid)

那么为什么索引不能像functionindex一样快运行呢?

那里似乎有多余的排序。
我不想只使用函数索引的原因是它是IMMUTABLE的,因此索引在INSERT / UPDATE / DELETE之后不会自动更新。



PS我不是在寻找架构更改建议。

解决方案

看起来像Gin索引表现良好。
如果我在索引列上创建Gin索引,然后将Join更改为

  ON t.indexes @ > jsonb_build_object('Id',c.indexes->'parentId')

在何处

  t.indexes @> jsonb_build_object('Id',h.pid)

它不如纯函数索引快,但是至少它会动态更新,并且执行计划没有不必要的排序



可以通过添加gin索引标志 jsonb_path_ops 来进一步提高性能。 >


I have a JsonB column in my table which holds hierarchical information.

MyTable (id uuid, indexes jsonb, content bytea)

Now if I create a CTE say

WITH RECURSIVE hierarchy(pid, id, content) AS (
  --load first parents
  SELECT t.indexes ->> 'parentId' as pId, t.id, t.content FROM MyTable c
  JOIN MyTable t ON t.indexes ->> 'Id' = c.indexes ->> 'parentId' 
  WHERE c.Id = ANY('{..Some UUIDS}')
  UNION
  SELECT t.indexes ->> 'parentId' as pId, t.id, t.content
  FROM hierarchy h, MyTable t
  WHERE t.indexes ->> 'Id' = h.pid
) SELECT id, content from hierarchy

Now an example run of building a parent tree from 2 nodes within a table of 300K records takes approximately 10s.

Now if I create an index

CREATE INDEX MyIndex ON MyTable
USING btree
((indexes ->> 'Id')

This reduces the time to 4.5s. This produces an analysis of

    ->  Recursive Union  (cost=23.81..4528423.71 rows=80794929 width=1219) (actual time=0.188..1802.636 rows=5 loops=1)
          ->  Nested Loop  (cost=23.81..3150.15 rows=899 width=1219) (actual time=0.132..0.133 rows=1 loops=1)
                Output: (t.indexes ->> 'parentId'::text), t.id, t.content
                ->  Index Scan using "MyTable_pkey" on "TEST"."MyTable" c  (cost=0.42..8.44 rows=1 width=123) (actual time=0.053..0.053 rows=1 loops=1)
                      Output: c.id, c.content, c.indexes
                      Index Cond: (c.id = ANY ('{1c725f08-0324-41e9-b417-5ec885fb1cc9}'::uuid[]))
                ->  Bitmap Heap Scan on "TEST"."MyTable" t  (cost=23.39..3130.48 rows=899 width=1219) (actual time=0.066..0.066 rows=1 loops=1)
                      Output: t.id, t.content, t.indexes
                      Recheck Cond: (((t.indexes ->> 'Id'::text) = (c.indexes ->> 'parentId'::text)))
                      Heap Blocks: exact=1
                      ->  Bitmap Index Scan on "MyIndex"  (cost=0.00..23.17 rows=899 width=0) (actual time=0.055..0.055 rows=1 loops=1)
                            Index Cond: ((t.indexes ->> 'Id'::text) = (c.indexes ->> 'parentId'::text))

//UNION PART
          ->  Merge Join  (cost=770.60..290937.50 rows=8079403 width=1219) (actual time=360.467..360.476 rows=1 loops=5)
                Output: (t_1.indexes ->> 'parentId'::text), t_1.id, t_1.content
                Merge Cond: ((t_1.indexes ->> 'Id'::text) = h.pid)
                ->  Index Scan using "MyIndex" on "TEST"."MyTable" t_1  (cost=0.42..127680.55 rows=179742 width=1219) (actual time=0.019..288.168 rows=60478 loops=5)
                      Output: t_1.id, t_1.sourceid, t_1.content, t_1.indexes
                ->  Sort  (cost=770.18..792.65 rows=8990 width=32) (actual time=0.010..0.011 rows=1 loops=5)
                      Output: h.pid
                      Sort Key: h.pid
                      Sort Method: quicksort  Memory: 25kB
                      ->  WorkTable Scan on hierarchy h  (cost=0.00..179.80 rows=8990 width=32) (actual time=0.001..0.001 rows=1 loops=5)
                            Output: h.pid

Now I can get massive speed impovements by replacing indexes ->> 'parentId' with a function in the cte, and creating an index on the function.

CREATE FUNCTION "TEST"."MyFunction"(idarg uuid)
  RETURNS text AS
$BODY$ 
SELECT t.indexes ->> 'Id' as result FROM "TEST"."MyTable" t 
WHERE t.id = idarg 
$BODY$
LANGUAGE sql IMMUTABLE;

With an index

CREATE INDEX MyFunctionIndex ON MyTable
USING btree
(MyFunction(id))

This now takes 0.01s to execute the query With Analysis

->  Recursive Union  (cost=23.81..5333205.06 rows=80794929 width=1219) (actual time=0.163..0.291 rows=5 loops=1)
      ->  Nested Loop  (cost=23.81..3372.65 rows=899 width=1219) (actual time=0.082..0.084 rows=1 loops=1)
            Output: (t.indexes ->> 'parentId'::text), t.id, t.content, t.modified
            ->  Index Scan using "MyTable_pkey" on "TEST"."MyTable" c  (cost=0.42..8.44 rows=1 width=123) (actual time=0.019..0.019 rows=1 loops=1)
                  Output: c.id, c.sourceid, c.viewid, c.content, c.indexes, c.statekey, c.modified
                  Index Cond: (c.id = ANY ('{1c725f08-0324-41e9-b417-5ec885fb1cc9}'::uuid[]))
            ->  Bitmap Heap Scan on "TEST"."MyTable" t  (cost=23.39..3352.98 rows=899 width=1219) (actual time=0.037..0.037 rows=1 loops=1)
                  Output: t.id, t.content, t.indexes
                  Recheck Cond: (("TEST"."MyFunction"(t.id) = (c.indexes ->> 'parentId'::text)))
                  Heap Blocks: exact=1
                  ->  Bitmap Index Scan on "MyFunctionIndex"  (cost=0.00..23.17 rows=899 width=0) (actual time=0.025..0.025 rows=1 loops=1)
                        Index Cond: ("TEST"."MyFunction"(t.id) = (c.indexes ->> 'parentId'::text))

//UNION PART
          ->  Nested Loop  (cost=0.42..371393.38 rows=8079403 width=1219) (actual time=0.012..0.013 rows=1 loops=5)
                Output: (t_1.indexes ->> 'parentId'::text), t_1.id, t_1.content
                ->  WorkTable Scan on hierarchy h  (cost=0.00..179.80 rows=8990 width=32) (actual time=0.000..0.000 rows=1 loops=5)
                      Output: h.pid, h.id, h.content
                ->  Index Scan using "MyFunctionIndex" on "TEST"."MyTable" t_1  (cost=0.42..30.06 rows=899 width=1219) (actual time=0.010..0.010 rows=1 loops=5)
                      Output: t_1.id, t_1.content, t_1.indexes
                      Index Cond: ("TEST"."MyFunction"(t_1.id) = h.pid)

So why cant the index run as fast as the functionindex?
There seems to be a superfluous sort in there. And the reason I dont want to just use the function index is that it is IMMUTABLE so the index wont auto-update after INSERT/UPDATE/DELETE.

PS I am not looking for schema change suggestions.

解决方案

Looks like Gin indexes behave well. If I create a Gin index on the indexes column, and then change the Join to

ON t.indexes @> jsonb_build_object('Id', c.indexes -> 'parentId')

And the Where to

WHERE t.indexes @> jsonb_build_object('Id', h.pid)

Its not as fast as the purely function index, but it least it will update dynamically, and the execution plan doesnt have that unnecessary sort

Further performance improvements can be made by adding the gin index flag jsonb_path_ops

这篇关于Postgres分层(jsonb)CTE不必要地变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆