Postgres 9.4 jsonb数组作为表 [英] Postgres 9.4 jsonb array as table

查看:186
本文介绍了Postgres 9.4 jsonb数组作为表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个json数组,其中包含大约1000个结构元素oid:aaa,instance:bbb,value:ccc。

  {_ id:37637070 
,data:[{oid:11.5.15.1。 4,value:1,instance:1.1.4}
,{oid:11.5.15.1.9,value:17,instance :1.1.4}
,{oid:12.5.15.1.5,value:0.0.0.0,instance:0}]}

oid instance 是每个json数组唯一的。如果我可以选择更改结构,我会将格式更改为键:值

  {11.5.15.1.4-1.1.4:1,11.5.15.1.9-1.1.4:17,12.5.15.1.5-0:0.0.0.0 } 

但是,如果我需要保留旧结构


  1. 从阵列中获取特定 oid 的最快方法是什么?


  2. 获得包含3列 oid instance 。或者更好的是带有oid + instance作为列标题的数据透视表。


对于2.我尝试了以下方法,但是在一张大桌子上它很慢:

 选择* 
来自(
选择a-> ;>'oid'oid,a->>'instance'实例,a->>'value'value1,id
from(
select jsonb_array_elements(config#>'{ data}')a,id
来自配置
)b
)c
其中oid ='1.3.6.1.4.1.7352.3.10.2.5.35.3'和instance = '0'和value1<> 1’ ;


解决方案

查询



缺少您的表定义。假设:

  CREATE TABLE配置(
config_id serial PRIMARY KEY
,config jsonb NOT NULL
);

查找及其行对于给定的 oid 实例

  SELECT c.config_id,d->>'value'AS value 
FROM configuration c
,jsonb_array_elements(config->'data')d - 默认col名称是值
WHERE d->>'oid'='1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND d->>'instance'=' 0'
AND d->>'value'<> '1'

这是一个隐含的 LATERAL 加入。比较:






2)获得一个包含3列 oid instance 和<$ c $的表的最快方法是什么? c> value。


我想使用 jsonb_populate_recordset() ,然后您可以在表定义中提供数据类型。假设 text for all:

  CREATE TEMP TABLE data_pattern(oid text,值文本,实例文本); 

也可以是持久(非临时)表。这个仅适用于本届会议。然后:

  SELECT c.config_id,d。* 
FROM configuration c
,jsonb_populate_recordset(NULL: :data_pattern,c.config->'data')d

这就是全部。第一个查询被重写:

  SELECT c.config_id,d。* 
FROM configuration c
,jsonb_populate_recordset (NULL :: data_pattern,c.config->'data')d
WHERE d.oid ='1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND d.instance = '0'
AND d.value<> 1’ ;

但是比第一个查询慢。使用更大表的性能的关键是索引支持:



索引



您可以轻松索引规范化(已翻译)表或您在问题中提出的替代布局。索引当前布局并不是那么明显,但也可能。为了获得最佳性能,我建议在数据键上使用 jsonb_path_ops 运算符类进行功能索引。 每个文档:


jsonb_ops jsonb_path_ops GIN
索引是前者为每个键
和数据中的值创建独立索引项,而后者仅为
中的每个值创建索引项。


这个应该可以创造奇迹

  CREATE INDEX configuration_my_idx ON配置
使用gin((config->'data')jsonb_path_ops);

有人可能会认为只有JSON数组元素的完全匹配才能起作用,如:

  SELECT * FROM configuration 
WHERE(config->'data')@> '[{oid:1.3.6.1.4.1.7352.3.10.2.5.35.3
,instance:0,value:1234}]';

注意JSON数组表示法(包围 [提供值的 ),这是必需的。



但是具有子集的数组元素密钥也可以工作:

  SELECT * FROM configuration 
WHERE(config->'data' )@> '[{oid:1.3.6.1.4.1.7352.3.10.2.5.35.3
,instance:0}]'

困难的部分是加入你看似不吉利的添加谓词值<> 1’ 。必须小心将所有谓词应用于相同的数组元素。你可以将它与第一个查询结合起来:

  SELECT c。*,d->>'value'AS value 
FROM配置c
,jsonb_array_elements(config->'data')d
WHERE(config->'data')@> '[{oid:1.3.6.1.4.1.7352.3.10.2.5.35.3,instance:0}]'
AND d->>'oid'='1.3 .6.1.4.1.7352.3.10.2.5.35.3' - 必须重复
AND d->>'instance'='0' - 必须重复
AND d-> >'value'<> '1' - 在这里我们可以排除

Voilá。



特殊索引



如果您的表很大,索引大小可能是决定因素。您可以将此特殊解决方案的性能与功能索引进行比较:



此函数从给定的<提取 oid-instance 组合的Postgres数组code> jsonb value:

 创建或替换函数f_config_json2arr(_j jsonb)
RETURNS text [] LANGUAGE sql IMMUTABLE AS
$ func $
SELECT ARRAY(
SELECT(elem->>'oid')||' - '||(elem- >>'instance')
FROM jsonb_array_elements(_j)elem

$ func $

我们可以基于此建立一个功能指数:

  CREATE INDEX configuration_conrfig_special_idx ON配置
使用杜松子酒(f_config_json2arr(config->'data'));

并以此为基础查询:

  SELECT * FROM configuration 
WHERE f_config_json2arr(config->'data')@> '{1.3.6.1.4.1.7352.3.10.2.5.35.3-0}':: text []

这个想法是索引应该大得多,因为它只存储没有键的组合值。 数组 遏制运营商 @> 本身应该与 jsonb遏制运营商 @> 。我不希望有太大的区别,但我会非常感兴趣哪个更快。



与此相关答案中的第一个解决方案类似(但更专业): / p>



旁白:




  • 我不会将 oid 用作列名,因为它也用于内部用途Postgres。

  • 如果可能,我会使用没有JSON的普通规范化表。


I have a json array with around 1000 elements of the structure "oid: aaa, instance:bbb, value:ccc".

{"_id": 37637070
, "data": [{"oid": "11.5.15.1.4", "value": "1", "instance": "1.1.4"}
         , {"oid": "11.5.15.1.9", "value": "17", "instance": "1.1.4"}
         , {"oid": "12.5.15.1.5", "value": "0.0.0.0", "instance": "0"}]}

oid and instance are unique per json array. If I was given the option to change the structure I would have changed the format to key:value:

{"11.5.15.1.4-1.1.4":"1", "11.5.15.1.9-1.1.4": "17", "12.5.15.1.5-0": "0.0.0.0"}

However, if I need to stay with the old structure

  1. What is the fastest way to get specific oid from the array?

  2. What is the fastest way to get a table with 3 columns of oid, instance and value. Or even better a pivot table with oid+instance as the column header.

For 2. I tried the following, but it was pretty slow on a large table:

select *
from (
   select a->>'oid' oid, a->>'instance' instance, a->>'value' value1, id
   from (
      select jsonb_array_elements(config#>'{data}')  a, id
      from configuration
      ) b
   ) c
where  oid = '1.3.6.1.4.1.7352.3.10.2.5.35.3' and instance = '0' and value1 <> '1';

解决方案

Query

Your table definition is missing. Assuming:

CREATE TABLE configuration (
  config_id serial PRIMARY KEY
, config jsonb NOT NULL
);

To find the a value and its row for given oid and instance:

SELECT c.config_id, d->>'value' AS value
FROM   configuration c
     , jsonb_array_elements(config->'data') d  -- default col name is "value"
WHERE  d->>'oid'      = '1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND    d->>'instance' = '0'
AND    d->>'value'   <> '1'

That's an implicit LATERAL join. Compare:

2) What is the fastest way to get a table with 3 columns of oid, instance and value.

I suppose to use jsonb_populate_recordset(), then you can provide data types in the table definition. Assuming text for all:

CREATE TEMP TABLE data_pattern (oid text, value text, instance text);

Could also be a persisted (non-temp) table. This one is only for the current session. Then:

SELECT c.config_id, d.*
FROM   configuration c
     , jsonb_populate_recordset(NULL::data_pattern, c.config->'data') d

That's all. The first query rewritten:

SELECT c.config_id, d.*
FROM   configuration c
     , jsonb_populate_recordset(NULL::data_pattern, c.config->'data') d
WHERE  d.oid      = '1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND    d.instance = '0'
AND    d.value   <> '1';

But that's slower than the first query. Key to performance with bigger table is index support:

Index

You could easily index the normalized (translated) table or the alternative layout you proposed in the question. Indexing your current layout is not as obvious, but also possible. For best performance I suggest a functional index on just the data key with the jsonb_path_ops operator class. Per documentation:

The technical difference between a jsonb_ops and a jsonb_path_ops GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data.

This should work wonders for performance:

CREATE INDEX configuration_my_idx ON configuration
USING gin ((config->'data') jsonb_path_ops);

One might expect that only a complete match for a JSON array element would work, like:

SELECT * FROM configuration
WHERE  (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3"
                            , "instance": "0", "value": "1234"}]';

Note the JSON array notation (with enclosing []) of the provided value, that's required.

But array elements with a subset of keys work as well:

SELECT * FROM configuration
WHERE  (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3"
                            , "instance": "0"}]'

The hard part is to incorporate your seemingly unsuspicious added predicate value <> '1'. Care must be taken to apply all predicates to the same array element. You could combine this with the first query:

SELECT c.*, d->>'value' AS value
FROM   configuration c
     , jsonb_array_elements(config->'data') d
WHERE  (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3", "instance": "0"}]'
AND    d->>'oid'      = '1.3.6.1.4.1.7352.3.10.2.5.35.3'  -- must be repeated
AND    d->>'instance' = '0'                               -- must be repeated
AND    d->>'value'   <> '1'                               -- here we can rule out

Voilá.

Special index

If your table is huge, index size may be a deciding factor. You could compare performance of this special solution with a functional index:

This function extracts a Postgres array of oid-instance combinations from a given jsonb value:

CREATE OR REPLACE FUNCTION f_config_json2arr(_j jsonb)
  RETURNS text[] LANGUAGE sql IMMUTABLE AS
$func$
SELECT ARRAY(
   SELECT (elem->>'oid') || '-' || (elem->>'instance')
   FROM   jsonb_array_elements(_j) elem
   )
$func$

We can build a functional index based on this:

CREATE INDEX configuration_conrfig_special_idx ON configuration
USING  gin (f_config_json2arr(config->'data'));

And base the query on it:

SELECT * FROM configuration
WHERE  f_config_json2arr(config->'data') @> '{1.3.6.1.4.1.7352.3.10.2.5.35.3-0}'::text[]

The idea is that the index should be substantially smaller because it only stores the combined values without keys. The array containment operator @> itself should perform similar to the jsonb containment operator @>. I don't expect a big difference, but I would be very interested which is faster.

Similar to the first solution in this related answer (but more specialized):

Asides:

  • I would not use oid as column name since that's also used for internal purposes in Postgres.
  • If possible, I would use a plain, normalized table without JSON.

这篇关于Postgres 9.4 jsonb数组作为表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆