如何从按索引存储数组元素的规范化表中获取数组? [英] How to get arrays from a normalised table that stores array elements by index?
问题描述
我有一个表,按它们所属的数组存储数组元素,并且它们在数组中的索引.看起来很聪明,因为这些数组是期望稀疏,并分别更新其元素.假设这是表格:
I have a table storing array elements by the array they belong to and their index in the array. It seemed smart because the arrays were expected to be sparse, and have their elements updated individually. Let's say this is the table:
CREATE TABLE values (
pk TEXT,
i INTEGER,
value REAL,
PRIMARY KEY (pk, i)
);
pk | i | value
----+---+-------
A | 0 | 17.5
A | 1 | 32.7
A | 3 | 5.3
B | 1 | 13.5
B | 2 | 4.8
B | 4 | 89.1
现在,我想将它们作为真实数组获得,即A的 {17.5,32.7,NULL,53}
和 {NULL,13.5,4.8,NULL,89.1} >代码> B.
Now I would like to get these as real arrays, i.e. {17.5, 32.7, NULL, 53}
for A and {NULL, 13.5, 4.8, NULL, 89.1}
for B.
我希望通过分组查询很容易实现以及适当的汇总函数.然而,事实证明那里没有这样的函数可以通过其索引将元素放入数组(或下标,如postgres所称).如果元素是连续的-我只可以将 array_agg
与 ORDER BY i
.但是我想要结果中的空值数组.
I would have expected that it's easily possible with a grouping query
and an appropriate aggregate function. However, it turned out that there
is no such function that puts elements into an array by its index (or
subscript, as postgres calls it). It would've been much simpler if the
elements were successive - I just could've used array_agg
with
ORDER BY i
. But I want the null values in the result
arrays.
我最终得到的是这个怪物:
What I ended up with was this monster:
SELECT
pk,
ARRAY( SELECT
( SELECT value
FROM values innervals
WHERE innervals.pk = outervals.pk AND i = generate_series
)
FROM generate_series(0, MAX(i))
ORDER BY generate_series -- is this really necessary?
)
FROM values outervals
GROUP BY pk;
不得不两次 SELECT…FROM values
很丑陋,而且查询计划程序似乎无法对此进行优化.
Having to SELECT … FROM values
twice is ugly, and the query planner doesn't seem to be able to optimise this.
是否有一种简单的方法可以将分组的行作为子查询中的关系引用,这样我就可以从generate_series(0,MAX(i))中选择值左联接???
?
通过定义自定义聚合函数?
Edit :似乎我在找的东西可以通过多参数 unnest
和 array_agg
来实现,尽管它不是特别优雅:
Edit: It seems what I was looking for is possible with multiple-argument unnest
and array_agg
, although it is not particularly elegant:
SELECT
pk,
ARRAY( SELECT val
FROM generate_series(0, MAX(i)) AS series (series_i)
LEFT OUTER JOIN
unnest( array_agg(value ORDER BY i),
array_agg(i ORDER BY i) ) AS arr (val, arr_i)
ON arr_i = series_i
ORDER BY series_i
)
FROM values
GROUP BY pk;
查询计划程序甚至似乎意识到,它可以对已排序的 series_i
和 arr_i
进行已排序的合并 JOIN
,尽管我需要付出更多的努力才能真正理解 EXPLAIN
的输出. Edit 2 :实际上,这是 series_i
和 arr_i
之间的哈希联接,只有外部组聚合使用排序"策略.
The query planner even seems to realise that it can do a sorted merge , although I need to put some more effort in really understanding the JOIN
on the sorted series_i
and arr_i
EXPLAIN
output. Edit 2: It's actually a hash join between series_i
and arr_i
, only the outer group aggregation uses a "sorted" strategy.
推荐答案
通过定义自定义聚合函数?
它至少可以显着简化查询:
It does at least simplify the query significantly:
SELECT pk, array_by_subscript(i+1, value)
FROM "values"
GROUP BY pk;
使用
CREATE FUNCTION array_set(arr anyarray, index int, val anyelement) RETURNS anyarray
AS $$
BEGIN
arr[index] = val;
RETURN arr;
END
$$ LANGUAGE plpgsql STRICT;
CREATE FUNCTION array_fillup(arr anyarray) RETURNS anyarray
AS $$
BEGIN
-- necessary for nice to_json conversion of arrays that don't start at subscript 1
IF array_lower(arr, 1) > 1 THEN
arr[1] = NULL;
END IF;
RETURN arr;
END
$$ LANGUAGE plpgsql STRICT;
CREATE AGGREGATE array_by_subscript(int, anyelement) (
sfunc = array_set,
stype = anyarray,
initcond = '{}',
finalfunc = array_fillup
);
在线示例.它还有一个不错的查询计划,可以对 values
进行简单的线性扫描,我必须确定. array_set
增长数组的效率
根据 EXPLAIN ANALYZE
基准测试,在合理大小的测试数据集上,这实际上是最快的解决方案.相较于 ARRAY + UNNEST
解决方案的大约80ms,它花费了55ms,并且比针对通用表表达式的联接的160ms要快得多.
Online example. It also has a nice query plan that does a simple linear scan on the values
, I'll have to benchmark how efficient .array_set
is at growing the array
This is in fact the fastest solution, according to an EXPLAIN ANALYZE
benchmark on a reasonably-sized test data set. It took 55ms, compared to about 80ms of the ARRAY + UNNEST
solution, and is considerably faster than the 160ms of the join against the common table expression.
这篇关于如何从按索引存储数组元素的规范化表中获取数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!