结合关系查询提高Postgres jsonb查询的性能 [英] Improving performance of Postgres jsonb queries combined with relational queries

查看:66
本文介绍了结合关系查询提高Postgres jsonb查询的性能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个SELECT,可以查询常规的postgres表和jsonb列.当我选择整个jsonb列时,查询速度很快(574毫秒).但是,当我改为选择同一jsonb列的顶级路径时,查询速度降低了6倍(3241ms).我的最终查询需要从这些顶级jsonb路径中的4个访问字符串数组值,这将查询速度降低到5秒.

I have a single SELECT that queries both regular postgres tables and a jsonb column. When I SELECT the entire jsonb column, the query is fast (574 ms). However when I instead select a top-level path of the same jsonb column, the query slows down by 6x (3241ms). My final query needs to access the string array values from 4 of these top-level jsonb paths, which slows the query right down to 5 seconds.

我在 cfiles 表中有大约5万条记录,并且jsonb列 cfiles.property_values 的结构如下:

I have about 50K records in the cfiles table and the jsonb column cfiles.property_values is structured like this:

{
 "Sample Names":["up to 200 short strings..."],
 "Project IDs": ["up to 10 short strings..."],
 "Run IDs":     ["up to 10 short strings..."],
 "Data Type":   ["up to 10 short strings..."]
}

按照答案,我尝试过我在下面添加一个GIN索引,但效果不大(运行时在下面的注释中),我假设是因为我的查询不是使用 @> 运算符的纯json,而是与关系查询结合使用

Following this answer I tried adding a GIN index below but it had very little effect (run time in comments below), I'm assuming because my query is not pure json using the @> operator and is combined with a relational query.

CREATE INDEX ON cfiles USING GIN (property_values jsonb_path_ops);

令我惊讶的是,获得整个专栏与仅查询顶级json键相比,存在巨大差异.在这一点上,将整个jsonb列作为字符串提取并将其在逗号中分割并加引号似乎更为有效,这是我更希望避免的技巧.

I'm surprised at the huge difference in getting the whole column vs querying even just the top-level json keys. At this point, it seems more performant to fetch the whole jsonb column as a string and split it on the commas and chomp the quotes, which is a hack I'd prefer to avoid.

我的目标是< 2000毫秒-我还有其他方法可以改善性能吗?(查询,表格并在下面解释)-更新:使用PostgreSQL 12

I'm aiming for <2000ms - are there any other ways I can improve the performance? (query, table and explain below) - Update: Using PostgreSQL Version 12

SELECT
-- FAST OPTION: getting all of json: no GIN=579ms; with GIN=574ms
cfiles.property_values as "1907",
-- == vs ==
-- SLOW OPTION: getting a json path: no GIN=3273ms; with GIN=3241ms
cfiles.property_values #>> '{"Sample Names"}' as "1907",
-- adding another path: with GIN=4028ms
cfiles.property_values #>> '{"Project IDs"}' as "1908",
-- adding yet another path: with GIN=4774ms
cfiles.property_values #>> '{"Run IDs"}' as "1909",
-- adding yet another path: with GIN=5558ms
cfiles.property_values #>> '{"Data Type"}' as "1910",
-- ==== rest of query below I can't change ====
user_permissions.notified_at::text as "111",
group_permissions.notified_at::text as "112",
user_permissions.task_id::text as "113",
group_permissions.task_id::text as "114",
datasets.id as "151",
datasets.name as "154",
datasets.path as "155",
datasets.last_modified as "156",
datasets.file_count as "157",
datasets.locked as "158",
datasets.content_types as "159",
cfiles.name as "105",
cfiles.last_modified as "107",
pg_size_pretty(cfiles.size::bigint) as "106",
cfiles.id as "101",
cfiles.tid as "102",
cfiles.uuid as "103",
cfiles.path as "104",
cfiles.content_type as "108",
cfiles.locked as "109",
cfiles.checksum as "110"
FROM cfiles
JOIN datasets ON datasets.id=cfiles.dataset_id
LEFT JOIN user_permissions ON (user_permissions.cfile_id=cfiles.id OR user_permissions.dataset_id=datasets.id)
LEFT JOIN users on users.id=user_permissions.user_id
LEFT JOIN group_permissions ON (group_permissions.cfile_id=cfiles.id OR group_permissions.dataset_id=datasets.id)
LEFT JOIN groups ON groups.id=group_permissions.group_id
LEFT JOIN user_groups ON groups.id=user_groups.group_id
LEFT JOIN picklist_cfiles ON picklist_cfiles.cfile_id=cfiles.id
WHERE
cfiles.tid=5
ORDER BY "107" desc
LIMIT 20
OFFSET 0

                                           Table "public.cfiles"
     Column      |            Type             | Collation | Nullable |              Default               
-----------------+-----------------------------+-----------+----------+------------------------------------
 id              | bigint                      |           | not null | nextval('cfiles_id_seq'::regclass)
 tid             | bigint                      |           | not null | 
 uuid            | uuid                        |           | not null | gen_random_uuid()
 dataset_id      | bigint                      |           | not null | 
 path            | character varying           |           | not null | 
 name            | character varying           |           |          | 
 checksum        | character varying           |           |          | 
 size            | bigint                      |           |          | 
 last_modified   | timestamp without time zone |           |          | 
 content_type    | character varying           |           |          | 
 locked          | boolean                     |           | not null | false
 property_values | jsonb                       |           |          | 
 created_at      | timestamp without time zone |           | not null | 
 updated_at      | timestamp without time zone |           | not null | 
Indexes:
    "cfiles_pkey" PRIMARY KEY, btree (id)
    "cfiles_property_values_idx" gin (property_values jsonb_path_ops)
    "index_cfiles_dataset_id_path" UNIQUE, btree (dataset_id, path)
    "index_cfiles_name" btree (name)
    "index_cfiles_tid" btree (tid)
    "index_cfiles_uuid_id_path" UNIQUE, btree (uuid)
Foreign-key constraints:
    "cfiles_datasets_fk" FOREIGN KEY (dataset_id) REFERENCES datasets(id)
    "cfiles_tenants_fk" FOREIGN KEY (tid) REFERENCES tenants(id)
Referenced by:
    TABLE "group_permissions" CONSTRAINT "group_permissions_cfiles_fk" FOREIGN KEY (cfile_id) REFERENCES cfiles(id)
    TABLE "picklist_cfiles" CONSTRAINT "picklist_cfiles_cfiles_fk" FOREIGN KEY (cfile_id) REFERENCES cfiles(id)
    TABLE "user_permissions" CONSTRAINT "user_permissions_cfiles_fk" FOREIGN KEY (cfile_id) REFERENCES cfiles(id)

慢查询计划:

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13700.06..13700.11 rows=20 width=662) (actual time=5702.511..5702.521 rows=20 loops=1)
   Output: ((cfiles.property_values #>> '{"Sample Names"}'::text[])), ((cfiles.property_values #>> '{"Project IDs"}'::text[])), ((cfiles.property_values #>> '{"Run IDs"}'::text[])), ((cfiles.property_values #>> '{"Data Type"}'::text[])), ((user_permissions.notified_at)::text), ((group_permissions.notified_at)::text), ((user_permissions.task_id)::text), ((group_permissions.task_id)::text), datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, (pg_size_pretty(cfiles.size)), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
   ->  Sort  (cost=13700.06..13810.61 rows=44219 width=662) (actual time=5702.508..5702.512 rows=20 loops=1)
         Output: ((cfiles.property_values #>> '{"Sample Names"}'::text[])), ((cfiles.property_values #>> '{"Project IDs"}'::text[])), ((cfiles.property_values #>> '{"Run IDs"}'::text[])), ((cfiles.property_values #>> '{"Data Type"}'::text[])), ((user_permissions.notified_at)::text), ((group_permissions.notified_at)::text), ((user_permissions.task_id)::text), ((group_permissions.task_id)::text), datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, (pg_size_pretty(cfiles.size)), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
         Sort Key: cfiles.last_modified DESC
         Sort Method: top-N heapsort  Memory: 344kB
         ->  Hash Left Join  (cost=39.53..12523.41 rows=44219 width=662) (actual time=2.535..5526.409 rows=44255 loops=1)
               Output: (cfiles.property_values #>> '{"Sample Names"}'::text[]), (cfiles.property_values #>> '{"Project IDs"}'::text[]), (cfiles.property_values #>> '{"Run IDs"}'::text[]), (cfiles.property_values #>> '{"Data Type"}'::text[]), (user_permissions.notified_at)::text, (group_permissions.notified_at)::text, (user_permissions.task_id)::text, (group_permissions.task_id)::text, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, pg_size_pretty(cfiles.size), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
               Hash Cond: (cfiles.id = picklist_cfiles.cfile_id)
               ->  Nested Loop Left Join  (cost=38.19..10918.99 rows=44219 width=867) (actual time=1.639..632.739 rows=44255 loops=1)
                     Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, user_permissions.notified_at, user_permissions.task_id, group_permissions.notified_at, group_permissions.task_id
                     Join Filter: ((user_permissions.cfile_id = cfiles.id) OR (user_permissions.dataset_id = datasets.id))
                     Rows Removed by Join Filter: 177020
                     ->  Nested Loop Left Join  (cost=38.19..7822.61 rows=44219 width=851) (actual time=1.591..464.449 rows=44255 loops=1)
                           Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, group_permissions.notified_at, group_permissions.task_id
                           Join Filter: ((group_permissions.cfile_id = cfiles.id) OR (group_permissions.dataset_id = datasets.id))
                           Rows Removed by Join Filter: 354040
                           ->  Hash Join  (cost=35.75..4723.32 rows=44219 width=835) (actual time=1.301..163.411 rows=44255 loops=1)
                                 Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types
                                 Inner Unique: true
                                 Hash Cond: (cfiles.dataset_id = datasets.id)
                                 ->  Seq Scan on public.cfiles  (cost=0.00..4570.70 rows=44219 width=644) (actual time=0.044..49.425 rows=44255 loops=1)
                                       Output: cfiles.id, cfiles.tid, cfiles.uuid, cfiles.dataset_id, cfiles.path, cfiles.name, cfiles.checksum, cfiles.size, cfiles.last_modified, cfiles.content_type, cfiles.locked, cfiles.property_values, cfiles.created_at, cfiles.updated_at
                                       Filter: (cfiles.tid = 5)
                                       Rows Removed by Filter: 1561
                                 ->  Hash  (cost=28.11..28.11 rows=611 width=199) (actual time=1.234..1.235 rows=611 loops=1)
                                       Output: datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types
                                       Buckets: 1024  Batches: 1  Memory Usage: 149kB
                                       ->  Seq Scan on public.datasets  (cost=0.00..28.11 rows=611 width=199) (actual time=0.012..0.571 rows=611 loops=1)
                                             Output: datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types
                           ->  Materialize  (cost=2.44..3.97 rows=4 width=32) (actual time=0.000..0.002 rows=8 loops=44255)
                                 Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id
                                 ->  Hash Right Join  (cost=2.44..3.95 rows=4 width=32) (actual time=0.170..0.248 rows=8 loops=1)
                                       Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id
                                       Hash Cond: (user_groups.group_id = groups.id)
                                       ->  Seq Scan on public.user_groups  (cost=0.00..1.34 rows=34 width=8) (actual time=0.022..0.056 rows=34 loops=1)
                                             Output: user_groups.id, user_groups.tid, user_groups.user_id, user_groups.group_id, user_groups.created_at, user_groups.updated_at
                                       ->  Hash  (cost=2.39..2.39 rows=4 width=40) (actual time=0.121..0.121 rows=4 loops=1)
                                             Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id
                                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                             ->  Hash Right Join  (cost=1.09..2.39 rows=4 width=40) (actual time=0.063..0.092 rows=4 loops=1)
                                                   Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id
                                                   Hash Cond: (groups.id = group_permissions.group_id)
                                                   ->  Seq Scan on public.groups  (cost=0.00..1.19 rows=19 width=8) (actual time=0.010..0.017 rows=19 loops=1)
                                                         Output: groups.id, groups.tid, groups.name, groups.description, groups.default_uview, groups.created_at, groups.updated_at
                                                   ->  Hash  (cost=1.04..1.04 rows=4 width=40) (actual time=0.032..0.033 rows=4 loops=1)
                                                         Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id
                                                         Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                         ->  Seq Scan on public.group_permissions  (cost=0.00..1.04 rows=4 width=40) (actual time=0.017..0.022 rows=4 loops=1)
                                                               Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id
                     ->  Materialize  (cost=0.00..1.06 rows=4 width=40) (actual time=0.000..0.001 rows=4 loops=44255)
                           Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id
                           ->  Seq Scan on public.user_permissions  (cost=0.00..1.04 rows=4 width=40) (actual time=0.021..0.025 rows=4 loops=1)
                                 Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id
               ->  Hash  (cost=1.15..1.15 rows=15 width=8) (actual time=0.040..0.040 rows=15 loops=1)
                     Output: picklist_cfiles.cfile_id
                     Buckets: 1024  Batches: 1  Memory Usage: 9kB
                     ->  Seq Scan on public.picklist_cfiles  (cost=0.00..1.15 rows=15 width=8) (actual time=0.010..0.017 rows=15 loops=1)
                           Output: picklist_cfiles.cfile_id
 Planning Time: 3.141 ms
 Execution Time: 5702.799 ms
(61 rows)

更新:对CTE模式的重构使我降低到20ms

Update: Refactorng to CTE pattern got me down to 20ms

WITH T as (
    select cfiles.property_values as prop_vals,
    user_permissions.notified_at::text as "111",
    group_permissions.notified_at::text as "112",
    user_permissions.task_id::text as "113",
    group_permissions.task_id::text as "114",
    datasets.id as "151",
    datasets.name as "154",
    datasets.path as "155",
    datasets.last_modified as "156",
    datasets.file_count as "157",
    datasets.locked as "158",
    datasets.content_types as "159",
    cfiles.name as "105",
    cfiles.last_modified as "107",
    pg_size_pretty(cfiles.size::bigint) as "106",
    cfiles.id as "101",
    cfiles.tid as "102",
    cfiles.uuid as "103",
    cfiles.path as "104",
    cfiles.content_type as "108",
    cfiles.locked as "109",
    cfiles.checksum as "110"
    FROM cfiles
    JOIN datasets ON datasets.id=cfiles.dataset_id
    LEFT JOIN user_permissions ON (user_permissions.cfile_id=cfiles.id OR user_permissions.dataset_id=datasets.id)
    LEFT JOIN users on users.id=user_permissions.user_id
    LEFT JOIN group_permissions ON (group_permissions.cfile_id=cfiles.id OR group_permissions.dataset_id=datasets.id)
    LEFT JOIN groups ON groups.id=group_permissions.group_id
    LEFT JOIN user_groups ON groups.id=user_groups.group_id
    LEFT JOIN picklist_cfiles ON picklist_cfiles.cfile_id=cfiles.id
    WHERE
    cfiles.tid=5
    LIMIT 20
) 
SELECT
prop_vals ->> 'Sample Names' as "1907",
prop_vals ->> 'Project IDs' as "1908",
prop_vals ->> 'Run IDs' as "1909",
prop_vals ->> 'Data Type' as "1910",
"111", "112", "113", "114", "151", "154", "155", "156", "157",
"158", "159", "105", "107", "106", "101", "102", "103", "104",
"108", "109", "110"
FROM T
ORDER BY "107" desc;

CTE查询计划:

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=16.18..16.23 rows=20 width=662) (actual time=18.771..18.779 rows=20 loops=1)
   Output: ((t.prop_vals ->> 'Sample Names'::text)), ((t.prop_vals ->> 'Project IDs'::text)), ((t.prop_vals ->> 'Run IDs'::text)), ((t.prop_vals ->> 'Data Type'::text)), t."111", t."112", t."113", t."114", t."151", t."154", t."155", t."156", t."157", t."158", t."159", t."105", t."107", t."106", t."101", t."102", t."103", t."104", t."108", t."109", t."110"
   Sort Key: t."107" DESC
   Sort Method: quicksort  Memory: 368kB
   ->  Subquery Scan on t  (cost=4.05..15.74 rows=20 width=662) (actual time=1.091..18.412 rows=20 loops=1)
         Output: (t.prop_vals ->> 'Sample Names'::text), (t.prop_vals ->> 'Project IDs'::text), (t.prop_vals ->> 'Run IDs'::text), (t.prop_vals ->> 'Data Type'::text), t."111", t."112", t."113", t."114", t."151", t."154", t."155", t."156", t."157", t."158", t."159", t."105", t."107", t."106", t."101", t."102", t."103", t."104", t."108", t."109", t."110"
         ->  Limit  (cost=4.05..15.34 rows=20 width=987) (actual time=0.320..1.241 rows=20 loops=1)
               Output: cfiles.property_values, ((user_permissions.notified_at)::text), ((group_permissions.notified_at)::text), ((user_permissions.task_id)::text), ((group_permissions.task_id)::text), datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, (pg_size_pretty(cfiles.size)), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
               ->  Nested Loop Left Join  (cost=4.05..24965.23 rows=44219 width=987) (actual time=0.318..1.224 rows=20 loops=1)
                     Output: cfiles.property_values, (user_permissions.notified_at)::text, (group_permissions.notified_at)::text, (user_permissions.task_id)::text, (group_permissions.task_id)::text, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, cfiles.name, cfiles.last_modified, pg_size_pretty(cfiles.size), cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum
                     Join Filter: ((user_permissions.cfile_id = cfiles.id) OR (user_permissions.dataset_id = datasets.id))
                     Rows Removed by Join Filter: 80
                     ->  Nested Loop Left Join  (cost=4.05..20873.92 rows=44219 width=851) (actual time=0.273..1.056 rows=20 loops=1)
                           Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types, group_permissions.notified_at, group_permissions.task_id
                           Join Filter: ((group_permissions.cfile_id = cfiles.id) OR (group_permissions.dataset_id = datasets.id))
                           Rows Removed by Join Filter: 160
                           ->  Nested Loop  (cost=1.61..17774.63 rows=44219 width=835) (actual time=0.125..0.745 rows=20 loops=1)
                                 Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, datasets.id, datasets.name, datasets.path, datasets.last_modified, datasets.file_count, datasets.locked, datasets.content_types
                                 Inner Unique: true
                                 ->  Hash Left Join  (cost=1.34..4738.00 rows=44219 width=644) (actual time=0.094..0.475 rows=20 loops=1)
                                       Output: cfiles.property_values, cfiles.name, cfiles.last_modified, cfiles.size, cfiles.id, cfiles.tid, cfiles.uuid, cfiles.path, cfiles.content_type, cfiles.locked, cfiles.checksum, cfiles.dataset_id
                                       Hash Cond: (cfiles.id = picklist_cfiles.cfile_id)
                                       ->  Seq Scan on public.cfiles  (cost=0.00..4570.70 rows=44219 width=644) (actual time=0.046..0.360 rows=20 loops=1)
                                             Output: cfiles.id, cfiles.tid, cfiles.uuid, cfiles.dataset_id, cfiles.path, cfiles.name, cfiles.checksum, cfiles.size, cfiles.last_modified, cfiles.content_type, cfiles.locked, cfiles.property_values, cfiles.created_at, cfiles.updated_at
                                             Filter: (cfiles.tid = 5)
                                             Rows Removed by Filter: 629
                                       ->  Hash  (cost=1.15..1.15 rows=15 width=8) (actual time=0.034..0.035 rows=15 loops=1)
                                             Output: picklist_cfiles.cfile_id
                                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                             ->  Seq Scan on public.picklist_cfiles  (cost=0.00..1.15 rows=15 width=8) (actual time=0.010..0.018 rows=15 loops=1)
                                                   Output: picklist_cfiles.cfile_id
                                 ->  Index Scan using datasets_pkey on public.datasets  (cost=0.28..0.29 rows=1 width=199) (actual time=0.008..0.008 rows=1 loops=20)
                                       Output: datasets.id, datasets.tid, datasets.bucket_path_id, datasets.path, datasets.name, datasets.last_modified, datasets.file_count, datasets.size, datasets.content_types, datasets.locked, datasets.created_at, datasets.updated_at
                                       Index Cond: (datasets.id = cfiles.dataset_id)
                           ->  Materialize  (cost=2.44..3.97 rows=4 width=32) (actual time=0.005..0.009 rows=8 loops=20)
                                 Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id
                                 ->  Hash Right Join  (cost=2.44..3.95 rows=4 width=32) (actual time=0.088..0.122 rows=8 loops=1)
                                       Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id
                                       Hash Cond: (user_groups.group_id = groups.id)
                                       ->  Seq Scan on public.user_groups  (cost=0.00..1.34 rows=34 width=8) (actual time=0.007..0.016 rows=34 loops=1)
                                             Output: user_groups.id, user_groups.tid, user_groups.user_id, user_groups.group_id, user_groups.created_at, user_groups.updated_at
                                       ->  Hash  (cost=2.39..2.39 rows=4 width=40) (actual time=0.069..0.069 rows=4 loops=1)
                                             Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id
                                             Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                             ->  Hash Right Join  (cost=1.09..2.39 rows=4 width=40) (actual time=0.043..0.064 rows=4 loops=1)
                                                   Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, groups.id
                                                   Hash Cond: (groups.id = group_permissions.group_id)
                                                   ->  Seq Scan on public.groups  (cost=0.00..1.19 rows=19 width=8) (actual time=0.006..0.011 rows=19 loops=1)
                                                         Output: groups.id, groups.tid, groups.name, groups.description, groups.default_uview, groups.created_at, groups.updated_at
                                                   ->  Hash  (cost=1.04..1.04 rows=4 width=40) (actual time=0.022..0.022 rows=4 loops=1)
                                                         Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id
                                                         Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                         ->  Seq Scan on public.group_permissions  (cost=0.00..1.04 rows=4 width=40) (actual time=0.009..0.014 rows=4 loops=1)
                                                               Output: group_permissions.notified_at, group_permissions.task_id, group_permissions.cfile_id, group_permissions.dataset_id, group_permissions.group_id
                     ->  Materialize  (cost=0.00..1.06 rows=4 width=40) (actual time=0.001..0.003 rows=4 loops=20)
                           Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id
                           ->  Seq Scan on public.user_permissions  (cost=0.00..1.04 rows=4 width=40) (actual time=0.018..0.022 rows=4 loops=1)
                                 Output: user_permissions.notified_at, user_permissions.task_id, user_permissions.cfile_id, user_permissions.dataset_id, user_permissions.user_id
 Planning Time: 4.049 ms
 Execution Time: 19.128 ms
(60 rows)

推荐答案

您的慢查询将对所有44255行的大型jsonb数据进行解映射,然后将经过解析的值进行排序以挑选出前20行.(我不知道为什么它如此急切地进行去味).因此,将44235 JSONB删除了,只是被扔掉了.

Your slow query is deTOASTing the large jsonb data for all 44255 rows, and then carrying the parsed-out values through the sort to pick out the top 20 rows. (I don't know why it does the deTOASTing eagerly like that). So 44235 JSONB were deTOASTed just to be thrown away.

您的快速查询(大概是)从哈希联接中返回TOAST指针,使用这些小指针对行进行排序,然后仅对20个幸存者进行DETOAST删除.在EXPLAIN ANALYZE的情况下,它甚至不会破坏幸存者,它只是将指针扔掉了.

Your fast query is (presumably) returning TOAST pointers from the hash join, sorting the rows with those small pointers, and then deTOASTing only the 20 survivors. In the case of EXPLAIN ANALYZE, it doesn't even deTOAST the survivors, it just throws the pointers away.

这就是为什么",至于该怎么做,如果您真的无法更改最顶部下方的任何查询,我怀疑您在服务器端是否可以对此做任何事情

That is the "why", as for what to do about it, if you truly can't alter any of the query below the very top part, I doubt there is anything you can do about it on the server side.

如果可以更实质性地修改查询,则可以使用CTE来缩短运行时间.让CTE选择整个jsonb,然后在CTE上进行选择以从中提取值.

If you can modify the query more substantially, then you can improve the run time with a CTE. Have the CTE select the entire jsonb, and then the select on the CTE pulls the value out of it.

WITH T as (select cfiles.property_values as "1907", <rest of query>) 
SELECT "1907"->>'name1', "1907"->>'name2', <rest of select list> from T;

这篇关于结合关系查询提高Postgres jsonb查询的性能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆