加快对嵌套在对象jsonb数组中的键值的范围测试 [英] Speed up range test for key values nested in jsonb array of objects
问题描述
假设我有以下parents
表:
create table parents (
id integer not null constraint parents_pkey primary key,
name text not null,
children jsonb not null
);
其中children
是具有以下结构的 json数组:
Where children
is a json array of the following structure:
[
{
"name": "child1",
"age": 10
},
{
"name": "child2",
"age": 12
}
]
例如,我需要让所有育有10至12岁孩子的父母.
And I need, for example, to get all parents that have children with age between 10 and 12.
我创建以下查询:
select distinct
p.*
from
parents p, jsonb_array_elements(p.children) c
where
(c->>'age')::int between 10 and 12;
当表parents
很大时(例如1M记录),它可以很好地工作,但是非常慢.我试图在children
字段上使用'gin'索引,但这无济于事.
It works well but very slowly when the table parents
is big (for example 1M records). I tried to use 'gin' index on children
field but this did not help.
那么有没有一种方法可以加快此类查询的速度?也许还有另一种解决方案可以针对嵌套json数组中的字段进行查询/索引?
So is there a way to speed up such queries? Or maybe there is another solution to make queries/indexes against fields in the nested json arrays?
查询计划:
Unique (cost=1793091.18..1803091.18 rows=1000000 width=306) (actual time=4070.866..5106.998 rows=399947 loops=1)
-> Sort (cost=1793091.18..1795591.18 rows=1000000 width=306) (actual time=4070.864..4836.241 rows=497313 loops=1)
Sort Key: p.id, p.children, p.name
Sort Method: external merge Disk: 186040kB
-> Gather (cost=1000.00..1406321.34 rows=1000000 width=306) (actual time=0.892..1354.147 rows=497313 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Nested Loop (cost=0.00..1305321.34 rows=416667 width=306) (actual time=0.162..1794.134 rows=165771 loops=3)
-> Parallel Seq Scan on parents p (cost=0.00..51153.67 rows=416667 width=306) (actual time=0.075..239.786 rows=333333 loops=3)
-> Function Scan on jsonb_array_elements c (cost=0.00..3.00 rows=1 width=0) (actual time=0.004..0.004 rows=0 loops=1000000)
Filter: ((((value ->> 'age'::text))::integer >= 10) AND (((value ->> 'age'::text))::integer <= 12))
Rows Removed by Filter: 3
Planning time: 0.218 ms
Execution time: 5140.277 ms
推荐答案
第一个直接措施是使查询速度更快:
A first immediate measure would be to make the query you have a bit faster:
SELECT *
FROM parents p
WHERE EXISTS (
SELECT FROM jsonb_array_elements(p.children) c
WHERE (c->>'age')::int BETWEEN 10 AND 12
);
当多个数组对象匹配时,EXISTS
半联接可避免中间表中的行重复-以及外部查询中对DISTINCT ON
的需求.但这只是稍微快一点.
The EXISTS
semi-join avoids duplication of rows in the intermediate table when multiple array objects match - and the need for DISTINCT ON
in the outer query. But that's only mildly faster, yet.
The core problem is that you want to test for a range of integer values, while existing jsonb
operators do not provide such functionality.
有多种解决方法.不知道这些,这里有一个智能"解决方案可以解决给定的示例.技巧是将范围划分为不同的值,并使用jsonb
包含运算符@>
:
There are various ways around this. Not knowing any of this, here is a "smart" solution that solves the given example. The trick is to split the range to distinct values and use the jsonb
containment operator @>
:
SELECT *
FROM parents p
WHERE (p.children @> '[{"age": 10}]'
OR p.children @> '[{"age": 11}]'
OR p.children @> '[{"age": 12}]');
受jsonb_path_ops
GIN索引支持:
CREATE INDEX parents_children_gin_idx ON parents USING gin (children jsonb_path_ops);
但是,如果您的范围跨越了一个不完整的整数值,那么您将需要一些更通用的东西.正如总是一样,最佳解决方案取决于整个情况:数据分布,值频率,查询中的典型范围,是否可能为NULL值,行大小,读/写模式,每一个 jsonb
值是否具有一个或多个匹配的age
键? ...
But if your ranges span more than a hand full of integer values, you'll need something more generic. As always, the best solution depends on the complete situation: Data distribution, value frequencies, typical ranges in queries, NULL values possible?, row size, read/write patterns, does every jsonb
value have one or more matching age
key? ...
具有非常快速的专门索引的相关答案:
Related answer with specialized, very fast index:
相关:
- Index for finding an element in a JSON array
- Using indexes in json array in PostgreSQL
这篇关于加快对嵌套在对象jsonb数组中的键值的范围测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!