使用 CASE 和 GROUP BY 旋转的动态替代方案 [英] Dynamic alternative to pivot with CASE and GROUP BY

查看:26
本文介绍了使用 CASE 和 GROUP BY 旋转的动态替代方案的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一张看起来像这样的表格:

I have a table that looks like this:

id    feh    bar
1     10     A
2     20     A
3      3     B
4      4     B
5      5     C
6      6     D
7      7     D
8      8     D

我希望它看起来像这样:

And I want it to look like this:

bar  val1   val2   val3
A     10     20 
B      3      4 
C      5        
D      6      7     8

我有这样的查询:

SELECT bar, 
   MAX(CASE WHEN abc."row" = 1 THEN feh ELSE NULL END) AS "val1",
   MAX(CASE WHEN abc."row" = 2 THEN feh ELSE NULL END) AS "val2",
   MAX(CASE WHEN abc."row" = 3 THEN feh ELSE NULL END) AS "val3"
FROM
(
  SELECT bar, feh, row_number() OVER (partition by bar) as row
  FROM "Foo"
 ) abc
GROUP BY bar

这是一种非常随意的方法,如果要创建大量新列,则会变得笨拙.我想知道是否可以更好地使用 CASE 语句以使此查询更具动态性?另外,我很想看到其他方法来做到这一点.

This is a very make-shifty approach and gets unwieldy if there are a lot of new columns to be created. I was wondering if the CASE statements can be made better to make this query more dynamic? Also, I'd love to see other approaches to doing this.

推荐答案

如果您还没有安装附加模块 tablefunc,每个数据库运行这个命令一次:

If you have not installed the additional module tablefunc, run this command once per database:

CREATE EXTENSION tablefunc;

回答问题

适用于您的案例的非常基本的交叉表解决方案:

Answer to question

A very basic crosstab solution for your case:

SELECT * FROM crosstab(
  'SELECT bar, 1 AS cat, feh
   FROM   tbl_org
   ORDER  BY bar, feh')
 AS ct (bar text, val1 int, val2 int, val3 int);  -- more columns?

这里的特殊困难是,基表中没有类别 (cat).对于基本的1 参数形式,我们可以只提供一个虚拟列,其中包含一个虚拟值作为类别.无论如何都会忽略该值.

The special difficulty here is, that there is no category (cat) in the base table. For the basic 1-parameter form we can just provide a dummy column with a dummy value serving as category. The value is ignored anyway.

这是极少数情况之一,其中crosstab() 函数的第二个参数不需要>,因为根据此问题的定义,所有 NULL 值仅出现在右侧的悬空列中.并且可以通过来确定顺序.

This is one of the rare cases where the second parameter for the crosstab() function is not needed, because all NULL values only appear in dangling columns to the right by definition of this problem. And the order can be determined by the value.

如果我们有一个实际的category列,其名称决定了结果中值的顺序,我们需要crosstab().这里我借助窗口函数row_number 合成了一个类别列(), 以 crosstab() 为基础:

If we had an actual category column with names determining the order of values in the result, we'd need the 2-parameter form of crosstab(). Here I synthesize a category column with the help of the window function row_number(), to base crosstab() on:

SELECT * FROM crosstab(
   $$
   SELECT bar, val, feh
   FROM  (
      SELECT *, 'val' || row_number() OVER (PARTITION BY bar ORDER BY feh) AS val
      FROM tbl_org
      ) x
   ORDER BY 1, 2
   $$
 , $$VALUES ('val1'), ('val2'), ('val3')$$         -- more columns?
) AS ct (bar text, val1 int, val2 int, val3 int);  -- more columns?

其余的几乎都是普通的.在这些密切相关的答案中找到更多解释和链接.

The rest is pretty much run-of-the-mill. Find more explanation and links in these closely related answers.

基础知识:
如果您不熟悉 crosstab() 函数,请先阅读!

Basics:
Read this first if you are not familiar with the crosstab() function!

高级:

这就是您应该如何提供测试用例开始的方式:

That's how you should provide a test case to begin with:

CREATE TEMP TABLE tbl_org (id int, feh int, bar text);
INSERT INTO tbl_org (id, feh, bar) VALUES
   (1, 10, 'A')
 , (2, 20, 'A')
 , (3,  3, 'B')
 , (4,  4, 'B')
 , (5,  5, 'C')
 , (6,  6, 'D')
 , (7,  7, 'D')
 , (8,  8, 'D');

动态交叉表?

不是很动态,但@Clodoaldo 评论.使用 plpgsql 很难实现动态返回类型.但是有解决方法 - 有一些限制.

Dynamic crosstab?

Not very dynamic, yet, as @Clodoaldo commented. Dynamic return types are hard to achieve with plpgsql. But there are ways around it - with some limitations.

为了不让其余的事情进一步复杂化,我用一个更简单的测试用例来演示:

So not to further complicate the rest, I demonstrate with a simpler test case:

CREATE TEMP TABLE tbl (row_name text, attrib text, val int);
INSERT INTO tbl (row_name, attrib, val) VALUES
   ('A', 'val1', 10)
 , ('A', 'val2', 20)
 , ('B', 'val1', 3)
 , ('B', 'val2', 4)
 , ('C', 'val1', 5)
 , ('D', 'val3', 8)
 , ('D', 'val1', 6)
 , ('D', 'val2', 7);

调用:

SELECT * FROM crosstab('SELECT row_name, attrib, val FROM tbl ORDER BY 1,2')
AS ct (row_name text, val1 int, val2 int, val3 int);

返回:

 row_name | val1 | val2 | val3
----------+------+------+------
 A        | 10   | 20   |
 B        |  3   |  4   |
 C        |  5   |      |
 D        |  6   |  7   |  8

tablefunc 模块的内置功能

tablefunc 模块为通用crosstab() 调用提供了一个简单的基础结构,而无需提供列定义列表.许多用 C 编写的函数(通常非常快):

Built-in feature of tablefunc module

The tablefunc module provides a simple infrastructure for generic crosstab() calls without providing a column definition list. A number of functions written in C (typically very fast):

crosstabN()

crosstab1() - crosstab4() 是预定义的.一个小问题:他们需要并返回所有text.所以我们需要转换我们的 integer 值.但它简化了调用:

crosstab1() - crosstab4() are pre-defined. One minor point: they require and return all text. So we need to cast our integer values. But it simplifies the call:

SELECT * FROM crosstab4('SELECT row_name, attrib, val::text  -- cast!
                         FROM tbl ORDER BY 1,2')

结果:

 row_name | category_1 | category_2 | category_3 | category_4
----------+------------+------------+------------+------------
 A        | 10         | 20         |            |
 B        | 3          | 4          |            |
 C        | 5          |            |            |
 D        | 6          | 7          | 8          |

自定义crosstab()函数

对于更多列其他数据类型,我们创建自己的复合类型em> 和 function(一次).
类型:

Custom crosstab() function

For more columns or other data types, we create our own composite type and function (once).
Type:

CREATE TYPE tablefunc_crosstab_int_5 AS (
  row_name text, val1 int, val2 int, val3 int, val4 int, val5 int);

功能:

CREATE OR REPLACE FUNCTION crosstab_int_5(text)
  RETURNS SETOF tablefunc_crosstab_int_5
AS '$libdir/tablefunc', 'crosstab' LANGUAGE c STABLE STRICT;

调用:

SELECT * FROM crosstab_int_5('SELECT row_name, attrib, val   -- no cast!
                              FROM tbl ORDER BY 1,2');

结果:

 row_name | val1 | val2 | val3 | val4 | val5
----------+------+------+------+------+------
 A        |   10 |   20 |      |      |
 B        |    3 |    4 |      |      |
 C        |    5 |      |      |      |
 D        |    6 |    7 |    8 |      |

一个适用于所有人的多态动态函数

这超出了 tablefunc 模块所涵盖的范围.
为了使返回类型动态化,我使用多态类型和相关答案中详述的技术:

One polymorphic, dynamic function for all

This goes beyond what's covered by the tablefunc module.
To make the return type dynamic I use a polymorphic type with a technique detailed in this related answer:

1-参数形式:

CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _rowtype anyelement)
  RETURNS SETOF anyelement AS
$func$
BEGIN
   RETURN QUERY EXECUTE 
   (SELECT format('SELECT * FROM crosstab(%L) t(%s)'
                , _qry
                , string_agg(quote_ident(attname) || ' ' || atttypid::regtype
                           , ', ' ORDER BY attnum))
    FROM   pg_attribute
    WHERE  attrelid = pg_typeof(_rowtype)::text::regclass
    AND    attnum > 0
    AND    NOT attisdropped);
END
$func$  LANGUAGE plpgsql;

使用此变体为 2 参数形式重载:

Overload with this variant for the 2-parameter form:

CREATE OR REPLACE FUNCTION crosstab_n(_qry text, _cat_qry text, _rowtype anyelement)
  RETURNS SETOF anyelement AS
$func$
BEGIN
   RETURN QUERY EXECUTE 
   (SELECT format('SELECT * FROM crosstab(%L, %L) t(%s)'
                , _qry, _cat_qry
                , string_agg(quote_ident(attname) || ' ' || atttypid::regtype
                           , ', ' ORDER BY attnum))
    FROM   pg_attribute
    WHERE  attrelid = pg_typeof(_rowtype)::text::regclass
    AND    attnum > 0
    AND    NOT attisdropped);
END
$func$  LANGUAGE plpgsql;

pg_typeof(_rowtype)::text::regclass:为每个用户定义的复合类型定义了一个行类型,以便在系统目录中列出属性(列)pg_attribute.获得它的快车道:将注册类型 (regtype) 转换为 text 并将此 text 转换为 regclass.

pg_typeof(_rowtype)::text::regclass: There is a row type defined for every user-defined composite type, so that attributes (columns) are listed in the system catalog pg_attribute. The fast lane to get it: cast the registered type (regtype) to text and cast this text to regclass.

您需要为要使用的每个返回类型定义一次:

You need to define once every return type you are going to use:

CREATE TYPE tablefunc_crosstab_int_3 AS (
    row_name text, val1 int, val2 int, val3 int);

CREATE TYPE tablefunc_crosstab_int_4 AS (
    row_name text, val1 int, val2 int, val3 int, val4 int);

...

对于临时调用,您也可以创建一个临时表来达到相同的(临时)效果:

For ad-hoc calls, you can also just create a temporary table to the same (temporary) effect:

CREATE TEMP TABLE temp_xtype7 AS (
    row_name text, x1 int, x2 int, x3 int, x4 int, x5 int, x6 int, x7 int);

或者使用现有表、视图或物化视图的类型(如果可用).

Or use the type of an existing table, view or materialized view if available.

使用上面的行类型:

1-参数形式(无缺失值):

1-parameter form (no missing values):

SELECT * FROM crosstab_n(
   'SELECT row_name, attrib, val FROM tbl ORDER BY 1,2'
 , NULL::tablefunc_crosstab_int_3);

2-参数形式(某些值可能会丢失):

2-parameter form (some values can be missing):

SELECT * FROM crosstab_n(
   'SELECT row_name, attrib, val FROM tbl ORDER BY 1'
 , $$VALUES ('val1'), ('val2'), ('val3')$$
 , NULL::tablefunc_crosstab_int_3);

这个一个函数适用于所有的返回类型,而tablefunc提供的crosstabN()框架> 每个模块都需要一个单独的函数.
如果您按照上面演示的顺序命名了类型,则只需替换粗体数字.要在基表中找到最大类别数:

This one function works for all return types, while the crosstabN() framework provided by the tablefunc module needs a separate function for each.
If you have named your types in sequence like demonstrated above, you only have to replace the bold number. To find the maximum number of categories in the base table:

SELECT max(count(*)) OVER () FROM tbl  -- returns 3
GROUP  BY row_name
LIMIT  1;

如果您想要单独的列,这将是动态的.像 @Clocoaldo 演示 之类的数组或简单的文本表示或包裹在文档类型(如 json)中的结果code> 或 hstore 可以动态地用于任意数量的类别.

That's about as dynamic as this gets if you want individual columns. Arrays like demonstrated by @Clocoaldo or a simple text representation or the result wrapped in a document type like json or hstore can work for any number of categories dynamically.

免责声明:
当用户输入被转换为代码时,总是存在潜在危险.确保这不能用于 SQL 注入.不要接受不受信任的用户的输入(直接).

Disclaimer:
It's always potentially dangerous when user input is converted to code. Make sure this cannot be used for SQL injection. Don't accept input from untrusted users (directly).

SELECT * FROM crosstab_n('SELECT bar, 1, feh FROM tbl_org ORDER BY 1,2'
                       , NULL::tablefunc_crosstab_int_3);

这篇关于使用 CASE 和 GROUP BY 旋转的动态替代方案的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆