如何从按索引存储数组元素的规范化表中获取数组? [英] How to get arrays from a normalised table that stores array elements by index?

查看:62
本文介绍了如何从按索引存储数组元素的规范化表中获取数组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个表,按它们所属的数组存储数组元素,并且它们在数组中的索引.看起来很聪明,因为这些数组是期望稀疏,并分别更新其元素.假设这是表格:

I have a table storing array elements by the array they belong to and their index in the array. It seemed smart because the arrays were expected to be sparse, and have their elements updated individually. Let's say this is the table:

CREATE TABLE values (
    pk TEXT,
    i INTEGER,
    value REAL,
    PRIMARY KEY (pk, i)
);

 pk | i | value
----+---+-------
 A  | 0 | 17.5
 A  | 1 | 32.7
 A  | 3 | 5.3
 B  | 1 | 13.5
 B  | 2 | 4.8
 B  | 4 | 89.1

现在,我想将它们作为真实数组获得,即A的 {17.5,32.7,NULL,53} {NULL,13.5,4.8,NULL,89.1} 代码> B.

Now I would like to get these as real arrays, i.e. {17.5, 32.7, NULL, 53} for A and {NULL, 13.5, 4.8, NULL, 89.1} for B.

我希望通过分组查询很容易实现以及适当的汇总函数.然而,事实证明那里没有这样的函数可以通过其索引将元素放入数组(或下标,如postgres所称).如果元素是连续的-我只可以将 array_agg ORDER BY i .但是我想要结果中的空值数组.

I would have expected that it's easily possible with a grouping query and an appropriate aggregate function. However, it turned out that there is no such function that puts elements into an array by its index (or subscript, as postgres calls it). It would've been much simpler if the elements were successive - I just could've used array_agg with ORDER BY i. But I want the null values in the result arrays.

我最终得到的是这个怪物:

What I ended up with was this monster:

SELECT
  pk,
  ARRAY( SELECT
    ( SELECT value
      FROM values innervals
      WHERE innervals.pk = outervals.pk AND i = generate_series
    )
    FROM generate_series(0, MAX(i))
    ORDER BY generate_series -- is this really necessary?
  )
FROM values outervals
GROUP BY pk;

不得不两次 SELECT…FROM values 很丑陋,而且查询计划程序似乎无法对此进行优化.

Having to SELECT … FROM values twice is ugly, and the query planner doesn't seem to be able to optimise this.

是否有一种简单的方法可以将分组的行作为子查询中的关系引用,这样我就可以从generate_series(0,MAX(i))中选择值左联接??? ?

通过定义自定义聚合函数 Edit :似乎我在找的东西可以通过多参数 unnest array_agg 来实现,尽管它不是特别优雅:

Edit: It seems what I was looking for is possible with multiple-argument unnest and array_agg, although it is not particularly elegant:

SELECT
  pk,
  ARRAY( SELECT val
    FROM generate_series(0, MAX(i)) AS series (series_i)
    LEFT OUTER JOIN
      unnest( array_agg(value ORDER BY i),
              array_agg(i ORDER BY i) ) AS arr (val, arr_i)
      ON arr_i = series_i
    ORDER BY series_i
  )
FROM values
GROUP BY pk;

查询计划程序甚至似乎意识到,它可以对已排序的 series_i arr_i 进行已排序的合并 JOIN ,尽管我需要付出更多的努力才能真正理解 EXPLAIN 的输出. Edit 2 :实际上,这是 series_i arr_i 之间的哈希联接,只有外部组聚合使用排序"策略.

The query planner even seems to realise that it can do a sorted merge JOIN on the sorted series_i and arr_i, although I need to put some more effort in really understanding the EXPLAIN output. Edit 2: It's actually a hash join between series_i and arr_i, only the outer group aggregation uses a "sorted" strategy.

推荐答案

通过定义自定义聚合函数它至少可以显着简化查询:

It does at least simplify the query significantly:

SELECT pk, array_by_subscript(i+1, value)
FROM "values"
GROUP BY pk;

使用

CREATE FUNCTION array_set(arr anyarray, index int, val anyelement) RETURNS anyarray
AS $$
BEGIN
    arr[index] = val;
    RETURN arr;
END
$$ LANGUAGE plpgsql STRICT;

CREATE FUNCTION array_fillup(arr anyarray) RETURNS anyarray
AS $$
BEGIN
   -- necessary for nice to_json conversion of arrays that don't start at subscript 1
   IF array_lower(arr, 1) > 1 THEN
       arr[1] = NULL;
   END IF;
   RETURN arr;
END
$$ LANGUAGE plpgsql STRICT;

CREATE AGGREGATE array_by_subscript(int, anyelement) (
 sfunc = array_set,
 stype = anyarray,
 initcond = '{}',
 finalfunc = array_fillup
);

在线示例.它还有一个不错的查询计划,可以对 values 进行简单的线性扫描,我必须确定 array_set 增长数组的效率.
根据 EXPLAIN ANALYZE 基准测试,在合理大小的测试数据集上,这实际上是最快的解决方案.相较于 ARRAY + UNNEST 解决方案的大约80ms,它花费了55ms,并且比针对通用表表达式的联接的160ms要快得多.

Online example. It also has a nice query plan that does a simple linear scan on the values, I'll have to benchmark how efficient array_set is at growing the array.
This is in fact the fastest solution, according to an EXPLAIN ANALYZE benchmark on a reasonably-sized test data set. It took 55ms, compared to about 80ms of the ARRAY + UNNEST solution, and is considerably faster than the 160ms of the join against the common table expression.

这篇关于如何从按索引存储数组元素的规范化表中获取数组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆