Postgres 9.4 jsonb数组作为表 [英] Postgres 9.4 jsonb array as table
问题描述
我有一个json数组,其中包含大约1000个结构元素oid:aaa,instance:bbb,value:ccc。
{_ id:37637070
,data:[{oid:11.5.15.1。 4,value:1,instance:1.1.4}
,{oid:11.5.15.1.9,value:17,instance :1.1.4}
,{oid:12.5.15.1.5,value:0.0.0.0,instance:0}]}
oid
和 instance
是每个json数组唯一的。如果我可以选择更改结构,我会将格式更改为键:值:
{11.5.15.1.4-1.1.4:1,11.5.15.1.9-1.1.4:17,12.5.15.1.5-0:0.0.0.0 }
但是,如果我需要保留旧结构
-
从阵列中获取特定
oid
的最快方法是什么? -
获得包含3列
oid
,instance $ c $的表的最快方法是什么? c>和
值
。或者更好的是带有oid + instance作为列标题的数据透视表。
对于2.我尝试了以下方法,但是在一张大桌子上它很慢:
选择*
来自(
选择a-> ;>'oid'oid,a->>'instance'实例,a->>'value'value1,id
from(
select jsonb_array_elements(config#>'{ data}')a,id
来自配置
)b
)c
其中oid ='1.3.6.1.4.1.7352.3.10.2.5.35.3'和instance = '0'和value1<> 1’ ;
查询
缺少您的表定义。假设:
CREATE TABLE配置(
config_id serial PRIMARY KEY
,config jsonb NOT NULL
);
查找值
及其行对于给定的 oid
和实例
:
SELECT c.config_id,d->>'value'AS value
FROM configuration c
,jsonb_array_elements(config->'data')d - 默认col名称是值
WHERE d->>'oid'='1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND d->>'instance'=' 0'
AND d->>'value'<> '1'
这是一个隐含的 LATERAL
加入。比较:
2)获得一个包含3列
oid
,instance
和<$ c $的表的最快方法是什么? c> value。
我想使用 jsonb_populate_recordset()
,然后您可以在表定义中提供数据类型。假设 text
for all:
CREATE TEMP TABLE data_pattern(oid text,值文本,实例文本);
也可以是持久(非临时)表。这个仅适用于本届会议。然后:
SELECT c.config_id,d。*
FROM configuration c
,jsonb_populate_recordset(NULL: :data_pattern,c.config->'data')d
这就是全部。第一个查询被重写:
SELECT c.config_id,d。*
FROM configuration c
,jsonb_populate_recordset (NULL :: data_pattern,c.config->'data')d
WHERE d.oid ='1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND d.instance = '0'
AND d.value<> 1’ ;
但是比第一个查询慢。使用更大表的性能的关键是索引支持:
索引
您可以轻松索引规范化(已翻译)表或您在问题中提出的替代布局。索引当前布局并不是那么明显,但也可能。为了获得最佳性能,我建议在数据
键上使用 jsonb_path_ops
运算符类进行功能索引。 每个文档:
jsonb_ops
与jsonb_path_ops $ c $之间的技术差异c> GIN
索引是前者为每个键
和数据中的值创建独立索引项,而后者仅为
中的每个值创建索引项。
这个应该可以创造奇迹:
CREATE INDEX configuration_my_idx ON配置
使用gin((config->'data')jsonb_path_ops);
有人可能会认为只有JSON数组元素的完全匹配才能起作用,如:
SELECT * FROM configuration
WHERE(config->'data')@> '[{oid:1.3.6.1.4.1.7352.3.10.2.5.35.3
,instance:0,value:1234}]';
注意JSON数组表示法(包围 [提供值的
),这是必需的。
但是具有子集的数组元素密钥也可以工作:
SELECT * FROM configuration
WHERE(config->'data' )@> '[{oid:1.3.6.1.4.1.7352.3.10.2.5.35.3
,instance:0}]'
困难的部分是加入你看似不吉利的添加谓词值<> 1’
。必须小心将所有谓词应用于相同的数组元素。你可以将它与第一个查询结合起来:
SELECT c。*,d->>'value'AS value
FROM配置c
,jsonb_array_elements(config->'data')d
WHERE(config->'data')@> '[{oid:1.3.6.1.4.1.7352.3.10.2.5.35.3,instance:0}]'
AND d->>'oid'='1.3 .6.1.4.1.7352.3.10.2.5.35.3' - 必须重复
AND d->>'instance'='0' - 必须重复
AND d-> >'value'<> '1' - 在这里我们可以排除
Voilá。
特殊索引
如果您的表很大,索引大小可能是决定因素。您可以将此特殊解决方案的性能与功能索引进行比较:
此函数从给定的<提取 oid-instance 组合的Postgres数组code> jsonb value:
创建或替换函数f_config_json2arr(_j jsonb)
RETURNS text [] LANGUAGE sql IMMUTABLE AS
$ func $
SELECT ARRAY(
SELECT(elem->>'oid')||' - '||(elem- >>'instance')
FROM jsonb_array_elements(_j)elem
)
$ func $
我们可以基于此建立一个功能指数:
CREATE INDEX configuration_conrfig_special_idx ON配置
使用杜松子酒(f_config_json2arr(config->'data'));
并以此为基础查询:
SELECT * FROM configuration
WHERE f_config_json2arr(config->'data')@> '{1.3.6.1.4.1.7352.3.10.2.5.35.3-0}':: text []
这个想法是索引应该大得多,因为它只存储没有键的组合值。 数组 遏制运营商 @>
本身应该与 jsonb遏制运营商 @>
强> 。我不希望有太大的区别,但我会非常感兴趣哪个更快。
与此相关答案中的第一个解决方案类似(但更专业): / p>
旁白:
- 我不会将
oid
用作列名,因为它也用于内部用途Postgres。 - 如果可能,我会使用没有JSON的普通规范化表。
I have a json array with around 1000 elements of the structure "oid: aaa, instance:bbb, value:ccc".
{"_id": 37637070
, "data": [{"oid": "11.5.15.1.4", "value": "1", "instance": "1.1.4"}
, {"oid": "11.5.15.1.9", "value": "17", "instance": "1.1.4"}
, {"oid": "12.5.15.1.5", "value": "0.0.0.0", "instance": "0"}]}
oid
and instance
are unique per json array. If I was given the option to change the structure I would have changed the format to key:value:
{"11.5.15.1.4-1.1.4":"1", "11.5.15.1.9-1.1.4": "17", "12.5.15.1.5-0": "0.0.0.0"}
However, if I need to stay with the old structure
What is the fastest way to get specific
oid
from the array?What is the fastest way to get a table with 3 columns of
oid
,instance
andvalue
. Or even better a pivot table with oid+instance as the column header.
For 2. I tried the following, but it was pretty slow on a large table:
select *
from (
select a->>'oid' oid, a->>'instance' instance, a->>'value' value1, id
from (
select jsonb_array_elements(config#>'{data}') a, id
from configuration
) b
) c
where oid = '1.3.6.1.4.1.7352.3.10.2.5.35.3' and instance = '0' and value1 <> '1';
Query
Your table definition is missing. Assuming:
CREATE TABLE configuration (
config_id serial PRIMARY KEY
, config jsonb NOT NULL
);
To find the a value
and its row for given oid
and instance
:
SELECT c.config_id, d->>'value' AS value
FROM configuration c
, jsonb_array_elements(config->'data') d -- default col name is "value"
WHERE d->>'oid' = '1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND d->>'instance' = '0'
AND d->>'value' <> '1'
That's an implicit LATERAL
join. Compare:
2) What is the fastest way to get a table with 3 columns of
oid
,instance
andvalue.
I suppose to use jsonb_populate_recordset()
, then you can provide data types in the table definition. Assuming text
for all:
CREATE TEMP TABLE data_pattern (oid text, value text, instance text);
Could also be a persisted (non-temp) table. This one is only for the current session. Then:
SELECT c.config_id, d.*
FROM configuration c
, jsonb_populate_recordset(NULL::data_pattern, c.config->'data') d
That's all. The first query rewritten:
SELECT c.config_id, d.*
FROM configuration c
, jsonb_populate_recordset(NULL::data_pattern, c.config->'data') d
WHERE d.oid = '1.3.6.1.4.1.7352.3.10.2.5.35.3'
AND d.instance = '0'
AND d.value <> '1';
But that's slower than the first query. Key to performance with bigger table is index support:
Index
You could easily index the normalized (translated) table or the alternative layout you proposed in the question. Indexing your current layout is not as obvious, but also possible. For best performance I suggest a functional index on just the data
key with the jsonb_path_ops
operator class. Per documentation:
The technical difference between a
jsonb_ops
and ajsonb_path_ops
GIN index is that the former creates independent index items for each key and value in the data, while the latter creates index items only for each value in the data.
This should work wonders for performance:
CREATE INDEX configuration_my_idx ON configuration
USING gin ((config->'data') jsonb_path_ops);
One might expect that only a complete match for a JSON array element would work, like:
SELECT * FROM configuration
WHERE (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3"
, "instance": "0", "value": "1234"}]';
Note the JSON array notation (with enclosing []
) of the provided value, that's required.
But array elements with a subset of keys work as well:
SELECT * FROM configuration
WHERE (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3"
, "instance": "0"}]'
The hard part is to incorporate your seemingly unsuspicious added predicate value <> '1'
. Care must be taken to apply all predicates to the same array element. You could combine this with the first query:
SELECT c.*, d->>'value' AS value
FROM configuration c
, jsonb_array_elements(config->'data') d
WHERE (config->'data') @> '[{"oid": "1.3.6.1.4.1.7352.3.10.2.5.35.3", "instance": "0"}]'
AND d->>'oid' = '1.3.6.1.4.1.7352.3.10.2.5.35.3' -- must be repeated
AND d->>'instance' = '0' -- must be repeated
AND d->>'value' <> '1' -- here we can rule out
Voilá.
Special index
If your table is huge, index size may be a deciding factor. You could compare performance of this special solution with a functional index:
This function extracts a Postgres array of oid-instance combinations from a given jsonb
value:
CREATE OR REPLACE FUNCTION f_config_json2arr(_j jsonb)
RETURNS text[] LANGUAGE sql IMMUTABLE AS
$func$
SELECT ARRAY(
SELECT (elem->>'oid') || '-' || (elem->>'instance')
FROM jsonb_array_elements(_j) elem
)
$func$
We can build a functional index based on this:
CREATE INDEX configuration_conrfig_special_idx ON configuration
USING gin (f_config_json2arr(config->'data'));
And base the query on it:
SELECT * FROM configuration
WHERE f_config_json2arr(config->'data') @> '{1.3.6.1.4.1.7352.3.10.2.5.35.3-0}'::text[]
The idea is that the index should be substantially smaller because it only stores the combined values without keys. The array containment operator @>
itself should perform similar to the jsonb containment operator @>
. I don't expect a big difference, but I would be very interested which is faster.
Similar to the first solution in this related answer (but more specialized):
Asides:
- I would not use
oid
as column name since that's also used for internal purposes in Postgres. - If possible, I would use a plain, normalized table without JSON.
这篇关于Postgres 9.4 jsonb数组作为表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!