索引jsonb用于字段的数字比较 [英] Indexing jsonb for numeric comparison of fields

查看:142
本文介绍了索引jsonb用于字段的数字比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经用

create table resources (id serial primary key, fields jsonb);

它包含带有键的数据(从大集合中提取),值介于1到100之间,例如:

And it contains data with keys (drawn from a large set) and values between 1 and 100, like:

   id   |    fields                                                                                                 
--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      1 | {"tex": 23, "blair": 46, "cubic": 50, "raider": 57, "retard": 53, "hoariest": 78, "suturing": 25, "apostolic": 22, "unloosing": 37, "flagellated": 85}
      2 | {"egoist": 75, "poshest": 0, "annually": 19, "baptists": 29, "bicepses": 10, "eugenics": 9, "idolizes": 8, "spengler": 60, "scuppering": 13, "cliffhangers": 37}
      3 | {"entails": 27, "hideout": 22, "horsing": 98, "abortions": 88, "microsoft": 37, "spectrums": 26, "dilettante": 52, "ringmaster": 84, "floweriness": 72, "vivekananda": 24}
      4 | {"wraps": 6, "polled": 68, "coccyges": 63, "internes": 93, "unburden": 61, "aggregate": 76, "cavernous": 98, "stylizing": 65, "vamoosing": 35, "unoriginal": 40}
      5 | {"villon": 95, "monthly": 68, "puccini": 30, "samsung": 81, "branched": 33, "congeals": 6, "shriller": 47, "terracing": 27, "patriarchal": 86, "compassionately": 94}

我想搜索其值(与特定键相关联)大于某个基准值的条目.我可以做到这一点,例如通过:

I'd like to search for entries whose value (associated with a particular key) is greater than some benchmark value. I can accomplish this, for example via:

with exploded as (
    select id, (jsonb_each_text(fields)).*
    from resources)
select distinct id
    from exploded
    where key='polled' and value::integer>50;

...但是,这当然不使用索引,而是依靠表扫描.我想知道是否有:

... but of course this does not use an index, and it resorts to a table scan. I wonder if there is:

  1. 一种更有效的方式来查询已轮询"> 50的资源
  2. 一种构建支持这种查询的索引的方法

推荐答案

您尚未指定期望使用的INDEX类型,也未提供其定义.

You haven't specified what kind of INDEX you are expecting to be used and you haven't provided a definition of it.

jsonb字段的典型INDEX GIN 一个,但是在您的特定情况下,您需要比较 polled键中包含的某些值.

The typical INDEX for a jsonb field would be a GIN one, but in your specific case you need to actually compare some values contained in the polled key.

也许具有INDEX(尽管不是GIN一个!)! -expressional.html"rel =" noreferrer>表达式可能有用,但是我对此表示怀疑,它会变得非常麻烦,因为您至少需要进行双精度类型转换才能获取整数值和自定义IMMUTABLE函数实际执行CREATE INDEX语句中的类型转换.

Maybe a specific INDEX (though not a GIN one!) with an expression could be of some use, but I doubt it and it could get quite cumbersome since you would need at least a double type cast to obtain an integer value and a custom IMMUTABLE function to actually perform the type casts in your CREATE INDEX statement.

在采取仅能解决某些特定情况(如果您需要与其他fields键进行另一比较的情况?)之前,您可以尝试利用PostgreSQL 9.4 new LATERAL 功能和jsonb处理功能. 结果是查询的运行速度应比当前查询快8倍:

Before taking a complicated route which would solve just some specific cases (what if you'd need another comparison with a different fields key?), you could try to optimize your current query, taking advantage of PostgreSQL 9.4 new LATERAL capabilities and jsonb processing functions. The result is a query that should run up to 8 times faster than your current one:

SELECT r.id 
    FROM resources AS r,
    LATERAL jsonb_to_record(r.fields) AS l(polled integer) 
    WHERE l.polled > 50;


我进行了一次快速测试,以在我的注释中实践该想法,以便在实际比较值之前使用GIN INDEX限制行数,事实证明您可以真正使用GIN INDEX即使在那种情况下.

I did a quick test to put in practice the idea in my comment to use a GIN INDEX to restrict the number of rows before actually comparing the values, and it turned out you can really make some use of a GIN INDEX even in that situation.

必须使用默认的操作员类别jsonb_ops (不是更轻巧且性能更好的jsonb_path_ops)创建INDEX :

The INDEX must be created with the default operator class jsonb_ops (not the lighter and more performing jsonb_path_ops) :

CREATE INDEX ON resources USING GIN (fields);

现在,您可以利用索引的优势,只需在查询中包含一个存在的?测试即可:

Now you can take advantage of the index simply including an exist ? test in the query:

SELECT r.id
    FROM resources AS r,
    LATERAL jsonb_to_record(r.fields) AS l(polled integer) 
    WHERE r.fields ? 'polled' AND l.polled > 50;

查询现在的执行速度快了 3倍 (比第一个CTE版本快20倍).我已经测试了多达100万行,并且性能提升始终是相同的.


请记住,正如预期的那样,行数起着重要的作用:如果行数少于1K,则索引将毫无用处,查询计划程序可能不会使用它.

The query now performs about 3 times faster (which is about 20 times faster than the first CTE version). I've tested with up to 1M rows and the performance gain is always the same.


Keep in mind that, as expected, the number of rows plays an important role: with less than 1K rows the index is quite useless and the query planner probably will not use it.

也不要忘记jsonb_ops索引与实际数据相比可能会变得很大.对于像您这样的数据样本(从1K到1M行),索引本身比表中的实际数据大 170%,请自己检查:

Also don't forget the jsonb_ops index can become huge compared to the actual data. With a data sample like yours, ranging from 1K to 1M rows, the index itself is about 170% bigger than the actual data in the table, check it yourself:

SELECT pg_size_pretty(pg_total_relation_size('resources')) AS whole_table, 
       pg_size_pretty(pg_relation_size('resources')) AS data_only, 
       pg_size_pretty(pg_relation_size('resources_fields_idx')) AS gin_index_only;

仅仅给您一个想法,该表大约有300K行(如您的数据样本),该表大约为250MB,由90MB的数据和160MB的索引组成! 就我个人而言,我会使用一个没有索引的简单LATERAL JOIN来粘贴(实际上我确实会这样做).

Just to give you an idea, with about 300K rows like your data sample, the table is about 250MB, consisting of 90MB of data and 160MB of index! Personally, I would stick (and I actually do) with a simple LATERAL JOIN without an index.

这篇关于索引jsonb用于字段的数字比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆