PostgreSQL:在jsonb列上查询-索引没有使其更快 [英] Postgresql: query on jsonb column - index doesn't make it quicker

查看:264
本文介绍了PostgreSQL:在jsonb列上查询-索引没有使其更快的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Postgresql 9.6中有一个表,与关系表相比,在jsonb列上的查询速度较慢,并且在其上添加GIN索引并不能使其更快.

There is a table in Postgresql 9.6, query on jsonb column is slow compared to a relational table, and adding a GIN index on it doesn't make it quicker.

表格:

-- create table
create table dummy_jsonb (
    id serial8,
    data jsonb,
    primary key (id)
);

-- create index
CREATE INDEX dummy_jsonb_data_index ON dummy_jsonb USING gin (data);
-- CREATE INDEX dummy_jsonb_data_index ON dummy_jsonb USING gin (data jsonb_path_ops);

生成数据:

-- generate data,
CREATE OR REPLACE FUNCTION dummy_jsonb_gen_data(n integer) RETURNS integer AS $$
DECLARE
    i integer:=1;
    name varchar;
    create_at varchar;
    json_str varchar;
BEGIN
    WHILE i<=n LOOP
        name:='dummy_' || i::text;
        create_at:=EXTRACT(EPOCH FROM date_trunc('milliseconds', now())) * 1000;
        json_str:='{
                 "name": "' || name || '",
                 "size": ' || i || ',
                 "create_at": ' || create_at || '
               }';

        insert into dummy_jsonb(data) values
        (json_str::jsonb
        );
        i:= i + 1;
    END LOOP;

    return n;
END;
$$ LANGUAGE plpgsql;

-- call function,
select dummy_jsonb_gen_data(1000000);

-- drop function,
DROP FUNCTION IF EXISTS dummy_jsonb_gen_data(integer);

查询:

select * from dummy_jsonb
where data->>'name' like 'dummy_%' and data->>'size' >= '500000'
order by data->>'size' desc
offset 50000 limit 10;

测试结果:

  • 在速度较慢的虚拟机上,查询需要1.8秒.
  • 添加或删除索引,没有什么不同.
  • 使用jsonb_path_ops更改杜松子酒的索引也没有影响.
  • The query takes 1.8 seconds on a slow vm.
  • Adding or removing the index, don't make a difference.
  • Changing to index gin with jsonb_path_ops, also don't make a difference.

问题:

  • 是否可以提高索引或sql的速度来加快查询速度?
  • 如果不是,这意味着在pg中使用关系表在这种情况下更合适吗?
  • 而且,在我的测试中,mongodb性能更好,这是否意味着mongodb更适合此类存储&查询?
  • Is it possible to make the query quicker, either improve index or sql?
  • If not, the does it means, within pg a relational table is more proper in this case?
  • And, in my test, mongodb performs better, does that means mongodb is more proper for such storage & query?

推荐答案

手册中的语录

jsonb的默认GIN运算符类支持使用顶级键存在运算符??&?|运算符和路径/值存在运算符@>进行查询.默认的GIN运算符类jsonb_path_ops仅支持索引@>运算符.

The default GIN operator class for jsonb supports queries with top-level key-exists operators ?, ?& and ?| operators and path/value-exists operator @> [...] The non-default GIN operator class jsonb_path_ops supports indexing the @> operator only.

您的查询使用和与>的字符串比较(开始时可能不正确),GIN索引均不支持这些比较.

Your query uses LIKE and string comparison with > (which is probably not correct to begin with), neither of those are supported by a GIN index.

但是,即使(data ->> 'name')上的索引也不会用于条件data->>'name' like 'dummy_%',因为对于所有行,这都是正确的,因为每个名称都以dummy开头.

But even an index on (data ->> 'name') wouldn't be used for the condition data->>'name' like 'dummy_%' as that is true for all rows because every name starts with dummy.

您可以在名称上创建常规btree索引:

You can create a regular btree index on the name:

CREATE INDEX ON dummy_jsonb ( (data ->> 'name') varchar_pattern_ops);

如果条件具有足够的限制性,将使用哪个,例如:

Which will be used if the condition is restrictive enough, e.g.:

where data->>'name' like 'dummy_9549%'

如果需要查询大小,可以在((data ->> 'size')::int)上创建一个索引,然后使用类似以下的内容:

If you need to query for the size, you can create an index on ((data ->> 'size')::int) and then use something like this:

where (data->>'size')::int >= 500000


但是,使用limitoffset始终会强制数据库读取所有行,对其进行排序并限制结果.这永远不会很快.您可能想阅读这篇文章,以了解有关限制/偏移量不是非常大的原因的更多信息.


However your use of limit and offset will always force the database to read all rows, sort them and the limit the result. This is never going to be very fast. You might want to read this article for more information why limit/offset is not very efficient.

JSON是关系世界的一个很好的补充,但前提是您使用得当.如果一行不需要动态属性,请使用标准列和数据类型.尽管Postgres对JSON的支持非常好,但这并不意味着应该对它进行所有使用,只是因为它是当前的炒作. Postgres仍然是一个关系数据库,应该这样使用.

JSON is a nice addition to the relational world, but only if you use it appropriately. If you don't need dynamic attributes for a row, then use standard columns and data types. Even though JSON support is Postgres is extremely good, this doesn't mean one should use it for everything, just because it's the current hype. Postgres is still a relational database and should be used as such.

不相关,但是:您生成测试数据的函数可以简化为单个SQL语句.您可能还没有意识到generate_series()功能,例如:

Unrelated, but: your function to generate the test data can be simplified to a single SQL statement. You might not have been aware of the generate_series() function for things like that:

insert into dummy_jsonb(data)
select jsonb_build_object('name', 'dummy_'||i, 
                          'size', i::text, 
                          'created_at', (EXTRACT(EPOCH FROM date_trunc('milliseconds', clock_timestamp())) * 1000)::text)
from generate_series(1,1000000) as t(i);

这篇关于PostgreSQL:在jsonb列上查询-索引没有使其更快的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆