在PostgreSQL中动态生成列 [英] Dynamically generate columns in PostgreSQL

查看:231
本文介绍了在PostgreSQL中动态生成列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经看到有一些类似的问题,但是我还不了解如何自己编写代码.请记住,我只是该领域的初学者.

基本上我想像这样旋转表:

zoom |    day     | point         zoom | 2015-10-01 |  2015-10-02 | ......
------+-----------+-------  ---> ------+------------+-------------+
   1 | 2015-10-01 |   201            1 |    201     |     685     |
   2 | 2015-10-01 |    43            2 |     43     |     346     | 
   3 | 2015-10-01 |    80            3 |     80     |     534     | 
   4 | 2015-10-01 |   324            4 |    324     |     786     | 
   5 | 2015-10-01 |    25            5 |     25     |     685     |
   1 | 2015-10-02 |   685 
   2 | 2015-10-02 |   346 
   3 | 2015-10-02 |   534 
   4 | 2015-10-02 |   555 
   5 | 2015-10-02 |   786
   :
   :
   :

时间会有所不同.

我得到的结果在左侧:

SELECT 
zoom,
to_char(date_trunc('day', time), 'YYYY-MM-DD') AS day,
count(*) as point
FROM province
WHERE time >= '2015-05-01' AND time < '2015-06-01'
GROUP BY to_char(date_trunc('day', time), 'YYYY-MM-DD'), zoom;

我已经读到,如果我使用count会出现一些问题,并且如果我同时使用CASEGROUP BY会更好,但是我不知道如何CASE.

Crosstab本身不支持动态创建列名,但是,如果我理解正确的话,可以使用crosstab_hash实现.

这可能是一个不错的解决方案: http://okbob.blogspot.ca/2008/08/using-cursors-for-generating-cross.html ,但我坚持尝试自己编程.

我必须经常使用这种旋转方式,因此我会在其背后提供任何帮助和其他解释.

编辑1

我试图弄清楚交叉表如何与日期一起使用,当前不返回列的动态名称.稍后我将解释原因.这是针对主要问题的.在此示例中,我仅使用两个日期的时间段.

基于@Erwin Brandstetter的答案:

SELECT * FROM crosstab(
       'SELECT zoom, day, point
        FROM   province
        ORDER  BY 1, 2'
      , $$VALUES ('2015-10-01'::date), ('2015-10-02')$$)
AS ct (zoom text, day1 int, day2 int);

返回的结果是:

zoom |    day1    |    day2     | 
-----+------------+-------------+
   1 |    201     |     685     |
   2 |     43     |     346     | 
   3 |     80     |     534     | 
   4 |    324     |     786     | 

我正试图得到这个

zoom | 2015-10-01 |  2015-10-02 | 
-----+------------+-------------+
   1 |    201     |     685     |
   2 |     43     |     346     | 
   3 |     80     |     534     | 
   4 |    324     |     786     | 

但我的查询无效:

SELECT *
FROM crosstab(
      'SELECT *
       FROM province
       ORDER  BY 1,2')
AS ct (zoom text, "2015-10-01" date, "2015-10-02" date);

ERROR:  return and sql tuple descriptions are incompatible

Edit1,第1季度.为什么这不起作用,如何返回这样的结果?

我已阅读@Erwin Brandstetter提供给我的链接,尤其是以下链接:执行动态交叉表查询.我已经复制/粘贴了他的功能:

CREATE OR REPLACE FUNCTION pivottab(_tbl regclass, 
                                    _row text, _cat text, 
                                    _expr text,
                                    _type regtype)  
RETURNS text AS
$func$
DECLARE
   _cat_list text;
   _col_list text;
BEGIN
-- generate categories for xtab param and col definition list    
EXECUTE format(
 $$SELECT string_agg(quote_literal(x.cat), '), (')
        , string_agg(quote_ident  (x.cat), %L)
   FROM  (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
 , ' ' || _type || ', ', _cat, _tbl)
INTO  _cat_list, _col_list;

-- generate query string
RETURN format(
'SELECT * FROM crosstab(
   $q$SELECT %I, %I, %s
      FROM   %I
      GROUP  BY 1, 2
      ORDER  BY 1, 2$q$
 , $c$VALUES (%5$s)$c$
   ) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr, _tbl, _cat_list, _col_list, _type
);

END
$func$ LANGUAGE plpgsql;

并通过查询调用它

SELECT pivottab('province','zoom','day','point','date');

函数返回了我:

                         pivottab                         
----------------------------------------------------------
 SELECT * FROM crosstab(                                 +
    $q$SELECT zoom, day, point                           +
       FROM   province                                   +
       GROUP  BY 1, 2                                    +
       ORDER  BY 1, 2$q$                                 +
  , $c$VALUES ('2015-10-01'), ('2015-10-02')$c$          +
    ) ct(zoom text, "2015-10-01" date, "2015-10-02" date)
(1 row)

因此,当我编辑查询并添加时; (这很高兴;已经在那儿了)我得到了:

ERROR:  column "province.point" must appear in the GROUP BY clause or be used in an aggregate function

Edit1,第二季度.有什么想法可以解决这个问题吗?

Edit1,第3季度.我想下一个问题将是如何自动执行功能,在同一链接中也提到了该问题,但是在前面的步骤中却被卡住了.

解决方案

示例的基本交叉表查询很简单:

SELECT * FROM crosstab(
       'SELECT zoom, day, point
        FROM   province
        ORDER  BY 1, 2'

     , $$VALUES ('2015-10-01'::date), ('2015-10-02')$$)
AS ct (zoom text, day1 int, day2 int);

但是 不是 ,但具有动态列名或动态列数. 作为一种折衷,您可以有固定数量的列,而只能填充前几列.基础:

动态?

crosstab_hash不会为您提供动态列名.它可以重复使用而无需键入列定义列表,但不能用于 dynamic 列名称.例子:

对于真正的动态列名,您需要两次 往返服务器.使用第一个查询检索列名以构建第二个查询,还是创建游标,临时表或准备好的语句.无论您尝试什么,都需要两次往返. SQL希望在调用时知道返回类型.

我最接近动态"调用的是此相关答案中定义的自定义crosstab_n()函数:


或者您放弃了完全动态的交叉表查询的想法(因为您知道,这是不可能的),而使用了如上所述的两步工作流.

  1. 让一个函数生成交叉表查询文本.您可以使用此处提供的功能(并使其适应您的需求!):

    尤其是删除GROUP BY 1, 2,因为在交叉制表之前不会汇总行.

  2. 执行生成的函数.


为完整起见,还提供了新的 \crosstabview元命令Postgres 9.6 (刚刚发布)中的psql -具有相似的功能,并且可以显示动态列名称(附加动态名称发生在psql客户端中,而不是在Postgres服务器中).

I have seen that there are quit a few similar questions like this one, but I havent understood how to code it myself. Please have in mind that I am just a beginner in this field.

Basically I want to pivot the table like this:

zoom |    day     | point         zoom | 2015-10-01 |  2015-10-02 | ......
------+-----------+-------  ---> ------+------------+-------------+
   1 | 2015-10-01 |   201            1 |    201     |     685     |
   2 | 2015-10-01 |    43            2 |     43     |     346     | 
   3 | 2015-10-01 |    80            3 |     80     |     534     | 
   4 | 2015-10-01 |   324            4 |    324     |     786     | 
   5 | 2015-10-01 |    25            5 |     25     |     685     |
   1 | 2015-10-02 |   685 
   2 | 2015-10-02 |   346 
   3 | 2015-10-02 |   534 
   4 | 2015-10-02 |   555 
   5 | 2015-10-02 |   786
   :
   :
   :

Time can vary.

Results on left I get with:

SELECT 
zoom,
to_char(date_trunc('day', time), 'YYYY-MM-DD') AS day,
count(*) as point
FROM province
WHERE time >= '2015-05-01' AND time < '2015-06-01'
GROUP BY to_char(date_trunc('day', time), 'YYYY-MM-DD'), zoom;

I have read that there are some issues if I use count and also that it would be better if I use CASE and GROUP BY, however I have no idea how to CASE this.

Crosstab itself doesnt support dynamic creation of column names, but that can be achieved with crosstab_hash, if I understood it correctly.

This might be probably nice solution: http://okbob.blogspot.ca/2008/08/using-cursors-for-generating-cross.html however I am stucked with it trying to program it myself.

I have to use this kind of pivoting quite often, so I would appriciate any kind of help and additional explanation behind it.

Edit1

I am trying to figure out how crosstab works with dates, currently without returning dynamic names of columns. Later on I will explain why. It is realted to the main question. For this example I am using only period of 2 dates.

Based on @Erwin Brandstetter answer:

SELECT * FROM crosstab(
       'SELECT zoom, day, point
        FROM   province
        ORDER  BY 1, 2'
      , $$VALUES ('2015-10-01'::date), ('2015-10-02')$$)
AS ct (zoom text, day1 int, day2 int);

returned results are:

zoom |    day1    |    day2     | 
-----+------------+-------------+
   1 |    201     |     685     |
   2 |     43     |     346     | 
   3 |     80     |     534     | 
   4 |    324     |     786     | 

I am trying to get this

zoom | 2015-10-01 |  2015-10-02 | 
-----+------------+-------------+
   1 |    201     |     685     |
   2 |     43     |     346     | 
   3 |     80     |     534     | 
   4 |    324     |     786     | 

but my query doesnt work:

SELECT *
FROM crosstab(
      'SELECT *
       FROM province
       ORDER  BY 1,2')
AS ct (zoom text, "2015-10-01" date, "2015-10-02" date);

ERROR:  return and sql tuple descriptions are incompatible

Edit1, Q1. Why does this doesnt work and how can I return results like that?

I have read links that @Erwin Brandstetter provided me, especially this one: Execute a dynamic crosstab query. I have copied/pasted his function:

CREATE OR REPLACE FUNCTION pivottab(_tbl regclass, 
                                    _row text, _cat text, 
                                    _expr text,
                                    _type regtype)  
RETURNS text AS
$func$
DECLARE
   _cat_list text;
   _col_list text;
BEGIN
-- generate categories for xtab param and col definition list    
EXECUTE format(
 $$SELECT string_agg(quote_literal(x.cat), '), (')
        , string_agg(quote_ident  (x.cat), %L)
   FROM  (SELECT DISTINCT %I AS cat FROM %s ORDER BY 1) x$$
 , ' ' || _type || ', ', _cat, _tbl)
INTO  _cat_list, _col_list;

-- generate query string
RETURN format(
'SELECT * FROM crosstab(
   $q$SELECT %I, %I, %s
      FROM   %I
      GROUP  BY 1, 2
      ORDER  BY 1, 2$q$
 , $c$VALUES (%5$s)$c$
   ) ct(%1$I text, %6$s %7$s)'
, _row, _cat, _expr, _tbl, _cat_list, _col_list, _type
);

END
$func$ LANGUAGE plpgsql;

and call it with query

SELECT pivottab('province','zoom','day','point','date');

Function returned me:

                         pivottab                         
----------------------------------------------------------
 SELECT * FROM crosstab(                                 +
    $q$SELECT zoom, day, point                           +
       FROM   province                                   +
       GROUP  BY 1, 2                                    +
       ORDER  BY 1, 2$q$                                 +
  , $c$VALUES ('2015-10-01'), ('2015-10-02')$c$          +
    ) ct(zoom text, "2015-10-01" date, "2015-10-02" date)
(1 row)

So when I edited the query and added ; (it would be nice that ; is already there) I got:

ERROR:  column "province.point" must appear in the GROUP BY clause or be used in an aggregate function

Edit1, Q2. Any ideas how to solove this?

Edit1, Q3. I guess next question will be how to execute function automaticlly, which is also mentioned on the same link, but got stucked on previous steps.

解决方案

The basic crosstab query for your example is simple:

SELECT * FROM crosstab(
       'SELECT zoom, day, point
        FROM   province
        ORDER  BY 1, 2'

     , $$VALUES ('2015-10-01'::date), ('2015-10-02')$$)
AS ct (zoom text, day1 int, day2 int);

But not with dynamic column names or a dynamic number of columns. As a compromise, you can have a fixed number of columns and only fill the leading ones. Basics:

Dynamic?

crosstab_hash is not going to help you with dynamic column names. It's for repeated use without typing a column definition list, but not for dynamic column names. Examples:

For truly dynamic column names, you need two round trips to the server. Whether you retrieve the column names with a first query to build a second query, or you create a cursor or a temporary table or a prepared statement. Whatever you try, you need two round trips. SQL wants to know the return type at call time.

The closest I could get to a "dynamic" call is with my custom crosstab_n() function defined in this related answer:


Or you give up the idea of a completely dynamic crosstab query (because, you know, it's impossible) and use a two-step workflow, like mentioned above.

  1. Let a function generate the crosstab query text. You can use the function provided here (and adapt it to your needs!):

    In particular, remove GROUP BY 1, 2, since you do not aggregate rows before the cross tabulation.

  2. Execute the generated function.


For completeness, there is also the new \crosstabview metacommand in psql in Postgres 9.6 (just released) - with similar functionality, and it can display dynamic column names (attaching dynamic names happens in the psql client, not in the Postgres server).

这篇关于在PostgreSQL中动态生成列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆