如何在postgres中编写combinatorics函数? [英] How to write combinatorics function in postgres?

查看:90
本文介绍了如何在postgres中编写combinatorics函数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这种形式的PostgreSQL表:

I have a PostgreSQL table of this form:

base_id int | mods smallint[]
     3      |   {7,15,48}

我需要填充这种形式的表:

I need to populate a table of this form:

combo_id int | base_id int | mods smallint[]
     1       |     3       |      
     2       |     3       |      {7}
     3       |     3       |      {7,15}   
     4       |     3       |      {7,48}   
     5       |     3       |      {7,15,48}
     6       |     3       |      {15}
     7       |     3       |      {15,48}
     8       |     3       |      {48}

我认为我可以使用几乎做到这一点的函数来完成此任务,该函数遍历第一个表并将组合写入第二个表: 在SQL中生成所有组合

I think I could accomplish this using a function that does almost exactly this, iterating over the first table and writing combinations to the second table: Generate all combinations in SQL

但是,我是Postgres的新手,我一生都无法弄清楚如何使用plpgsql做到这一点.它并不需要特别快;它只会在后端定期运行.第一张表大约有80条记录,粗略计算表明我们可以预期第二张表大约有2600条记录.

But, I'm a Postgres novice and cannot for the life of me figure out how to do this using plpgsql. It doesn't need to be particularly fast; it will only be run periodically on the backend. The first table has approximately 80 records and a rough calculation suggests we can expect around 2600 records for the second table.

有人能至少指出我正确的方向吗?

Can anybody at least point me in the right direction?

:Craig:我有PostgreSQL 9.0.我可以成功使用UNNEST():

Craig: I've got PostgreSQL 9.0. I was successfully able to use UNNEST():

FOR messvar IN SELECT * FROM UNNEST(mods) AS mod WHERE mod BETWEEN 0 AND POWER(2, @n) - 1
LOOP
    RAISE NOTICE '%', messvar;
END LOOP;

但随后不知道下一步该怎么做.

but then didn't know where to go next.

作为参考,我最终使用了Erwin的解决方案,在其中添加了一行以向每个集合添加一个空结果('{}'),并删除了Erwin提到的特殊情况:

For reference, I ended up using Erwin's solution, with a single line added to add a null result ('{}') to each set and the special case Erwin refers to removed:

CREATE OR REPLACE FUNCTION f_combos(_arr integer[], _a integer[] DEFAULT '{}'::integer[], _z integer[] DEFAULT '{}'::integer[])
RETURNS SETOF integer[] LANGUAGE plpgsql AS
$BODY$
DECLARE
 i int;
 j int;
 _up int;
BEGIN
 IF array_length(_arr,1) > 0 THEN 
    _up := array_upper(_arr, 1);

    IF _a = '{}' AND _z = '{}' THEN RETURN QUERY SELECT '{}'::int[]; END IF;
    FOR i IN array_lower(_arr, 1) .. _up LOOP
       FOR j IN i .. _up  LOOP
          CASE j-i
          WHEN 0,1 THEN
             RETURN NEXT _a || _arr[i:j] || _z;
          ELSE
             RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
             RETURN QUERY SELECT *
             FROM f_combos(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z);
          END CASE;
       END LOOP;
    END LOOP;
 ELSE
    RETURN NEXT _arr;
 END IF;
END;
$BODY$

然后,我使用该函数填充表格:

Then, I used that function to populate my table:

INSERT INTO e_ecosystem_modified (ide_ecosystem, modifiers) 
(SELECT ide_ecosystem, f_combos(modifiers) AS modifiers FROM e_ecosystem WHERE ecosystemgroup <> 'modifier' ORDER BY ide_ecosystem, modifiers);

在源表中的79行中,修饰符数组中最多包含7个项目,查询用了250ms的时间在我的输出表中填充了2630行.很棒.

From 79 rows in my source table with a maximum of 7 items in the modifiers array, the query took 250ms to populate 2630 rows in my output table. Fantastic.

推荐答案

在我睡过之后,我有了一个全新,更简单,更快的主意:

After I slept over it I had a completely new, simpler, faster idea:

CREATE OR REPLACE FUNCTION f_combos(_arr anyarray)
  RETURNS TABLE (combo anyarray) LANGUAGE plpgsql AS
$BODY$
BEGIN
    IF array_upper(_arr, 1) IS NULL THEN
        combo := _arr; RETURN NEXT; RETURN;
    END IF;

    CASE array_upper(_arr, 1)
--  WHEN 0 THEN -- does not exist
    WHEN 1 THEN
        RETURN QUERY VALUES ('{}'), (_arr);
    WHEN 2 THEN
        RETURN QUERY VALUES ('{}'), (_arr[1:1]), (_arr), (_arr[2:2]);
    ELSE
        RETURN QUERY
        WITH x AS (
            SELECT f.combo FROM f_combos(_arr[1:array_upper(_arr, 1)-1]) f
            )
        SELECT x.combo FROM x
        UNION ALL
        SELECT x.combo || _arr[array_upper(_arr, 1)] FROM x;
    END CASE;
END
$BODY$;

致电:

SELECT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;

512行,总运行时间:2.899毫秒

512 rows, total runtime: 2.899 ms

  • 使用NULL和空数组处理特殊情况.
  • 为两个原始数组构建组合.
  • 任何更长的数组都分解为:
    • 长度为n-1的相同数组的组合
    • 加上所有与元素n组合的元素.递归.
    • Treat special cases with NULL and empty array.
    • Build combinations for a primitive array of two.
    • Any longer array is broken down into:
      • the combinations for same array of length n-1
      • plus all of those combined with element n .. recursively.

      一经掌握,便非常简单.

      Really simple, once you got it.

      • 适用于以下标1 开头的一维整数数组(请参见下文).
      • 速度是旧解决方案的2-3倍,扩展性更好.
      • 再次使用任何元素类型(使用多态类型).
      • 在问题中(以及@Craig在评论中向我指出)中显示的结果中包含空数组.
      • 更短,更优雅.
      • Works for 1-dimensional integer arrays starting with subscript 1 (see below).
      • 2-3 times as fast as old solution, scales better.
      • Works for any element type again (using polymorphic types).
      • Includes the empty array in the result as is displayed in the question (and as @Craig pointed out to me in the comments).
      • Shorter, more elegant.

      这假定 数组下标 1 (默认)开始.如果您不确定自己的值,请调用以下函数进行标准化:

      This assumes array subscripts starting at 1 (Default). If you are not sure about your values, call the function like this to normalize:

      SELECT * FROM  f_combos(_arr[array_lower(_arr, 1):array_upper(_arr, 1)]);
      

      不确定是否有更优雅的方法来标准化数组下标.我对此发表了一个问题:
      对一维数组的数组下标进行归一化,使其以1开头

      Not sure if there is a more elegant way to normalize array subscripts. I posted a question about that:
      Normalize array subscripts for 1-dimensional array so they start with 1

      CREATE OR REPLACE FUNCTION f_combos2(_arr int[], _a int[] = '{}', _z int[] = '{}')
       RETURNS SETOF int[] LANGUAGE plpgsql AS
      $BODY$
      DECLARE
         i int;
         j int;
         _up int;
      BEGIN
         IF array_length(_arr,1) > 0 THEN 
            _up := array_upper(_arr, 1);
      
            FOR i IN array_lower(_arr, 1) .. _up LOOP
               FOR j IN i .. _up  LOOP
                  CASE j-i
                  WHEN 0,1 THEN
                     RETURN NEXT _a || _arr[i:j] || _z;
                  WHEN 2 THEN
                     RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
                     RETURN NEXT _a || _arr[i:j] || _z;
                  ELSE
                     RETURN NEXT _a || _arr[i:i] || _arr[j:j] || _z;
                     RETURN QUERY SELECT *
                     FROM f_combos2(_arr[i+1:j-1], _a || _arr[i], _arr[j] || _z);
                  END CASE;
               END LOOP;
            END LOOP;
         ELSE
            RETURN NEXT _arr;
         END IF;
      END;
      $BODY$;
      

      致电:

      SELECT * FROM f_combos2('{7,15,48}'::int[]) ORDER BY 1;
      

      适用于一维整数数组. 可以进一步优化,但是对于此问题的范围肯定不是必需的.
      ORDER BY强制显示问题中显示的顺序.

      Works for 1-dimensional integer arrays. This could be further optimized, but that's certainly not needed for the scope of this question.
      ORDER BY to impose the order displayed in the question.

      提供NULL或空数组,因为注释中提到了NULL.

      Provide for NULL or empty array, as NULL is mentioned in the comments.

      已在PostgreSQL 9.1上进行了测试,但可以与任何中途的现代版本一起使用. array_lower()array_upper() 至少自PostgreSQL 7.4起就存在了.在版本8.4中,只有参数默认值是新的.可以轻松更换.

      Tested with PostgreSQL 9.1, but should work with any halfway modern version. array_lower() and array_upper() have been around for at least since PostgreSQL 7.4. Only parameter defaults are new in version 8.4. Could easily be replaced.

      表现不错.

      SELECT DISTINCT * FROM f_combos('{1,2,3,4,5,6,7,8,9}'::int[]) ORDER BY 1;
      

      511行,总运行时间:7.729毫秒

      511 rows, total runtime: 7.729 ms

      它基于此简单表单构建,该表单仅创建相邻元素的所有组合:

      It builds on this simple form that only creates all combinations of neighboring elements:

      CREATE FUNCTION f_combos(_arr int[])
        RETURNS SETOF int[] LANGUAGE plpgsql AS
      $BODY$
      DECLARE
         i  int;
         j  int;
        _up int;
      BEGIN
         _up := array_upper(_arr, 1);
      
         FOR i in array_lower(_arr, 1) .. _up LOOP
            FOR j in i .. _up LOOP
               RETURN NEXT _arr[i:j];
            END LOOP;
         END LOOP;
      END;
      $BODY$;
      

      但是,这对于包含两个以上元素的子数组将失败.因此:

      But this will fail for sub-arrays with more than two elements. So:

      • 对于任何具有3个元素的子数组,将添加一个仅包含外部两个元素的数组.这是提高性能的特殊情况的快捷方式,并且并非严格要求.

      对于具有3个以上元素的任何子数组,我采用两个外部元素并填充由同一函数构建的内部元素的所有组合 递归.

      For any sub-array with more than 3 elements I take the outer two elements and fill in with all combinations of inner elements built by the same function recursively.

      这篇关于如何在postgres中编写combinatorics函数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆