Postgres分层(jsonb)CTE不必要地变慢 [英] Postgres hierarchical (jsonb) CTE unnecessarily slow
问题描述
我的表中有一个JsonB列,其中包含分层信息。
MyTable(id uuid,索引jsonb,内容bytea )
现在,如果我创建CTE,则说
具有递归层次结构(pid,id,content)AS(
$ p $的内容p>
-加载第一个父项
SELECT t.indexes->>'parentId'作为pId ,t.id,t.content来自MyTable c
加入MyTable t在t.indexes->>'Id'= c.indexes->>'parentId'
WHERE c.Id = ANY('{.. Some UUIDS}')
UNION
选择t.indexes->>'parentId'作为pId,t.id,t.content
来自层次结构h ,MyTable t
WHERE t.indexes->>'Id'= h.pid
)SELECT ID,来自层次结构
现在从300K条记录的表中的2个节点构建父树的示例过程大约需要10秒钟。
现在,如果我创建索引
创建索引MyTable上的MyIndex
使用btree
(((索引-> ‘Id’)
这将时间缩短至4.5s。
这将产生对->的分析。递归联合(cost = 23.81..4528423.71行= 80794929宽度= 1219)(实际时间= 0.188..1802.636行= 5循环= 1)
->嵌套循环(cost = 23.81..3150.15行= 899宽度= 1219)(实际时间= 0.132..0.133行= 1循环= 1)
输出:(t.indexes->>'parentId': :text),t.id,t.content
->使用 TEST上的 MyTable_pkey进行索引扫描。 MyTable c(成本= 0.42..8.44行= 1宽度= 123)(实际时间= 0.053..0.053行= 1循环= 1)
输出: c.id,c.content,c.indexes
索引条件:(c.id = ANY('{1c725f08-0324-41e9-b417-5ec885fb1cc9}':: uuid []))
- >在 TEST。 MyTable t上进行位图堆扫描(成本= 23.39..3130.48行= 899宽度= 1219)(实际时间= 0.066..0.066行= 1循环= 1)
输出:t.id ,t.content,t.indexes
重新检查条件:((((t.indexes->>'Id':: text)=(c.indexes->>'parentId':: text) ))
堆块:精确= 1
->对 MyIndex进行位图索引扫描(成本= 0.00..23.17行= 899宽度= 0)(实际时间= 0.055..0.055行= 1循环= 1)
Index Cond:(((t.indexes-> ;>'Id':: text)=(c.indexes->>'parentId':: text))
//联合部分
->合并联接(成本= 770.60..290937.50行= 8079403宽度= 1219)(实际时间= 360.467..360.476行= 1循环= 5)
输出:(t_1.indexes->>'parentId': :text),t_1.id,t_1.content
合并条件:((t_1.indexes->>'Id':: text)= h.pid)
->使用 TEST上的 MyIndex进行索引扫描。 MyTable t_1(成本= 0.42..127680.55行= 179742宽度= 1219)(实际时间= 0.019..288.168行= 60478循环= 5)
输出: t_1.id,t_1.sourceid,t_1.content,t_1.indexes
->排序(cost = 770.18..792.65行= 8990宽度= 32)(实际时间= 0.010..0.011行= 1循环= 5)
输出:h.pid
排序键:h.pid
排序方法:quicksort内存:25kB
->在层次结构h上进行工作表扫描(成本= 0.00..179.80行= 8990宽度= 32)(实际时间= 0.001..0.001行= 1循环= 5)
输出:h.pid
现在,我可以通过在cte中用函数替换索引->>'parentId',并在
创建函数 TEST。 MyFunction(idarg uuid)
以$ b $返回文本b $ BODY $
选择t.indexes-> Id是来自 TEST的结果。 MyTable t
t.id = idarg
$ BODY $
LANGUAGE sql IMMUTABLE;
有索引
在MyTable
上创建索引MyFunctionIndex使用btree
(MyFunction(id))
现在需要0.01s来执行带有分析的查询
->递归联合(cost = 23.81..5333205.06行= 80794929宽度= 1219)(实际时间= 0.163..0.291行= 5循环= 1)嵌套循环(cost = 23.81..3372.65行= 899宽度= 1219)(实际时间= 0.082..0.084行= 1循环= 1)
输出:(t.indexes->>'parentId': :text),t.id,t.content,t.modified
->在 TEST上使用 MyTable_pkey进行索引扫描。 MyTable c(成本= 0.42..8.44行= 1宽度= 123)(实际时间= 0.019..0.019行= 1循环= 1)
输出: c.id,c.sourceid,c.viewid,c.content,c.indexes,c.statekey,c.modified
Index Cond:(c.id = ANY('{1c725f08-0324-41e9-b417 -5ec885fb1cc9}':: uuid []))
->在 TEST。 MyTable t上进行位图堆扫描(成本= 23.39..3352.98行= 899宽度= 1219)(实际时间= 0.037..0.037行= 1循环= 1)
输出:t.id ,t.content,t.indexes
重新检查条件:(( TEST。 MyFunction(t.id)=(c.indexes->>'parentId':: text)))
堆块:精确= 1
->对 MyFunctionIndex进行位图索引扫描(成本= 0.00..23.17行= 899宽度= 0)(实际时间= 0.025..0.025行= 1循环= 1)
索引条件:( TEST。 MyFunction (t.id)=(c.indexes->>'parentId':: text))
// UNION PART
->嵌套循环(成本= 0.42..371393.38行= 8079403宽度= 1219)(实际时间= 0.012..0.013行= 1循环= 5)
输出:(t_1.indexes->>'parentId': :text),t_1.id,t_1.content
->在层次结构h上进行工作表扫描(成本= 0.00..179.80行= 8990宽度= 32)(实际时间= 0.000..0.000行= 1循环= 5)
输出:h.pid,h.id,h。内容
->使用 TEST上的 MyFunctionIndex进行索引扫描。 MyTable t_1(成本= 0.42..30.06行= 899宽度= 1219)(实际时间= 0.010..0.010行= 1循环= 5)
输出: t_1.id,t_1.content,t_1.indexes
索引条件:( TEST。 MyFunction(t_1.id)= h.pid)
那么为什么索引不能像functionindex一样快运行呢?
那里似乎有多余的排序。
我不想只使用函数索引的原因是它是IMMUTABLE的,因此索引在INSERT / UPDATE / DELETE之后不会自动更新。
PS我不是在寻找架构更改建议。
解决方案看起来像Gin索引表现良好。
如果我在索引列上创建Gin索引,然后将Join更改为ON t.indexes @ > jsonb_build_object('Id',c.indexes->'parentId')
在何处
t.indexes @> jsonb_build_object('Id',h.pid)
它不如纯函数索引快,但是至少它会动态更新,并且执行计划没有不必要的排序
可以通过添加gin索引标志 jsonb_path_ops 来进一步提高性能。 >
I have a JsonB column in my table which holds hierarchical information.
MyTable (id uuid, indexes jsonb, content bytea)
Now if I create a CTE say
WITH RECURSIVE hierarchy(pid, id, content) AS ( --load first parents SELECT t.indexes ->> 'parentId' as pId, t.id, t.content FROM MyTable c JOIN MyTable t ON t.indexes ->> 'Id' = c.indexes ->> 'parentId' WHERE c.Id = ANY('{..Some UUIDS}') UNION SELECT t.indexes ->> 'parentId' as pId, t.id, t.content FROM hierarchy h, MyTable t WHERE t.indexes ->> 'Id' = h.pid ) SELECT id, content from hierarchy
Now an example run of building a parent tree from 2 nodes within a table of 300K records takes approximately 10s.
Now if I create an index
CREATE INDEX MyIndex ON MyTable USING btree ((indexes ->> 'Id')
This reduces the time to 4.5s. This produces an analysis of
-> Recursive Union (cost=23.81..4528423.71 rows=80794929 width=1219) (actual time=0.188..1802.636 rows=5 loops=1) -> Nested Loop (cost=23.81..3150.15 rows=899 width=1219) (actual time=0.132..0.133 rows=1 loops=1) Output: (t.indexes ->> 'parentId'::text), t.id, t.content -> Index Scan using "MyTable_pkey" on "TEST"."MyTable" c (cost=0.42..8.44 rows=1 width=123) (actual time=0.053..0.053 rows=1 loops=1) Output: c.id, c.content, c.indexes Index Cond: (c.id = ANY ('{1c725f08-0324-41e9-b417-5ec885fb1cc9}'::uuid[])) -> Bitmap Heap Scan on "TEST"."MyTable" t (cost=23.39..3130.48 rows=899 width=1219) (actual time=0.066..0.066 rows=1 loops=1) Output: t.id, t.content, t.indexes Recheck Cond: (((t.indexes ->> 'Id'::text) = (c.indexes ->> 'parentId'::text))) Heap Blocks: exact=1 -> Bitmap Index Scan on "MyIndex" (cost=0.00..23.17 rows=899 width=0) (actual time=0.055..0.055 rows=1 loops=1) Index Cond: ((t.indexes ->> 'Id'::text) = (c.indexes ->> 'parentId'::text)) //UNION PART -> Merge Join (cost=770.60..290937.50 rows=8079403 width=1219) (actual time=360.467..360.476 rows=1 loops=5) Output: (t_1.indexes ->> 'parentId'::text), t_1.id, t_1.content Merge Cond: ((t_1.indexes ->> 'Id'::text) = h.pid) -> Index Scan using "MyIndex" on "TEST"."MyTable" t_1 (cost=0.42..127680.55 rows=179742 width=1219) (actual time=0.019..288.168 rows=60478 loops=5) Output: t_1.id, t_1.sourceid, t_1.content, t_1.indexes -> Sort (cost=770.18..792.65 rows=8990 width=32) (actual time=0.010..0.011 rows=1 loops=5) Output: h.pid Sort Key: h.pid Sort Method: quicksort Memory: 25kB -> WorkTable Scan on hierarchy h (cost=0.00..179.80 rows=8990 width=32) (actual time=0.001..0.001 rows=1 loops=5) Output: h.pid
Now I can get massive speed impovements by replacing indexes ->> 'parentId' with a function in the cte, and creating an index on the function.
CREATE FUNCTION "TEST"."MyFunction"(idarg uuid) RETURNS text AS $BODY$ SELECT t.indexes ->> 'Id' as result FROM "TEST"."MyTable" t WHERE t.id = idarg $BODY$ LANGUAGE sql IMMUTABLE;
With an index
CREATE INDEX MyFunctionIndex ON MyTable USING btree (MyFunction(id))
This now takes 0.01s to execute the query With Analysis
-> Recursive Union (cost=23.81..5333205.06 rows=80794929 width=1219) (actual time=0.163..0.291 rows=5 loops=1) -> Nested Loop (cost=23.81..3372.65 rows=899 width=1219) (actual time=0.082..0.084 rows=1 loops=1) Output: (t.indexes ->> 'parentId'::text), t.id, t.content, t.modified -> Index Scan using "MyTable_pkey" on "TEST"."MyTable" c (cost=0.42..8.44 rows=1 width=123) (actual time=0.019..0.019 rows=1 loops=1) Output: c.id, c.sourceid, c.viewid, c.content, c.indexes, c.statekey, c.modified Index Cond: (c.id = ANY ('{1c725f08-0324-41e9-b417-5ec885fb1cc9}'::uuid[])) -> Bitmap Heap Scan on "TEST"."MyTable" t (cost=23.39..3352.98 rows=899 width=1219) (actual time=0.037..0.037 rows=1 loops=1) Output: t.id, t.content, t.indexes Recheck Cond: (("TEST"."MyFunction"(t.id) = (c.indexes ->> 'parentId'::text))) Heap Blocks: exact=1 -> Bitmap Index Scan on "MyFunctionIndex" (cost=0.00..23.17 rows=899 width=0) (actual time=0.025..0.025 rows=1 loops=1) Index Cond: ("TEST"."MyFunction"(t.id) = (c.indexes ->> 'parentId'::text)) //UNION PART -> Nested Loop (cost=0.42..371393.38 rows=8079403 width=1219) (actual time=0.012..0.013 rows=1 loops=5) Output: (t_1.indexes ->> 'parentId'::text), t_1.id, t_1.content -> WorkTable Scan on hierarchy h (cost=0.00..179.80 rows=8990 width=32) (actual time=0.000..0.000 rows=1 loops=5) Output: h.pid, h.id, h.content -> Index Scan using "MyFunctionIndex" on "TEST"."MyTable" t_1 (cost=0.42..30.06 rows=899 width=1219) (actual time=0.010..0.010 rows=1 loops=5) Output: t_1.id, t_1.content, t_1.indexes Index Cond: ("TEST"."MyFunction"(t_1.id) = h.pid)
So why cant the index run as fast as the functionindex?
There seems to be a superfluous sort in there. And the reason I dont want to just use the function index is that it is IMMUTABLE so the index wont auto-update after INSERT/UPDATE/DELETE.PS I am not looking for schema change suggestions.
解决方案Looks like Gin indexes behave well. If I create a Gin index on the indexes column, and then change the Join to
ON t.indexes @> jsonb_build_object('Id', c.indexes -> 'parentId')
And the Where to
WHERE t.indexes @> jsonb_build_object('Id', h.pid)
Its not as fast as the purely function index, but it least it will update dynamically, and the execution plan doesnt have that unnecessary sort
Further performance improvements can be made by adding the gin index flag jsonb_path_ops
这篇关于Postgres分层(jsonb)CTE不必要地变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!