使用 Tablefunc 旋转多列 [英] Pivot on Multiple Columns using Tablefunc

查看：15 发布时间：2022/1/22 20:50:53 sql postgresql pivot crosstab

本文介绍了使用 Tablefunc 旋转多列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

有没有人使用 tablefunc 而不是只使用 row name 来对多个变量进行透视?文档说明:

<块引用>

额外"列对于所有带有相同的 row_name 值.

如果不组合我想要以枢轴为中心的列，我不确定如何做到这一点(我非常怀疑这会给我提供所需的速度).一种可能的方法是将实体设为数字并将其以毫秒为单位添加到本地，但这似乎是一种不稳定的方式.

我已经编辑了在回答这个问题时使用的数据:PostgreSQL Crosstab Query.

 创建临时表 t4 (时间戳的时间,实体字符,状态整数,ct 整数);插入 t4 值('2012-01-01', 'a', 1, 1),('2012-01-01', 'a', 0, 2),('2012-01-02', 'b', 1, 3),('2012-01-02', 'c', 0, 4);选择 * 从交叉表('选择时间，实体，状态，ct从 t4按 1,2,3' 订购,$$VALUES (1::text), (0::text)$$)AS ct ("Section" 时间戳, "Attribute" 字符, "1" int, "0" int);

<上一页>部分 |属性 |1 |0---------------+-----------+---+---2012-01-01 00:00:00 |一个 |1 |22012-01-02 00:00:00 |b |3 |4

因此，正如文档所述，extra 列(即属性")对于每个行名(即节")都被假定为相同的.因此，它会为第二行报告 b，即使 'entity' 也具有该 'timeof' 值的 'c' 值.

期望的输出:

章节 |属性 |1 |0--------------+------------+---+---2012-01-01 00:00:00 |一个 |1 |22012-01-02 00:00:00 |乙 |3 |2012-01-02 00:00:00 |c ||4

有什么想法或参考吗?

更多背景知识:我可能需要为 billions 行执行此操作，我正在测试以长格式和宽格式存储这些数据并查看是否可以使用 tablefunc 比使用常规聚合函数更有效地从长格式转换为宽格式.
我将每分钟对大约 300 个实体进行大约 100 次测量.通常，我们需要比较给定实体在给定秒内所做的不同测量，因此我们需要经常使用宽格式.此外，对特定实体进行的测量是高度可变的.

我找到了一个资源:http://www.postgresonline.com/journal/categories/24-tablefunc.

解决方案

你查询的问题是 b 和 c 共享相同的时间戳 2012-01-02 00:00:00，并且您首先在 timestamp 列 timeof您的查询，所以 - 即使您添加了粗体强调 - b 和 c 只是属于同一组的额外列 2012-01-02 00:00:00.自 (引用手册):

<块引用>row_name 列必须是第一个.category 和 value 列必须是最后两列，按此顺序.row_name 和 category 之间的任何列都被视为额外".额外的"对于具有相同 row_name 值的所有行，列应相同.
我的大胆强调.
只需恢复前两列的顺序以使 entity 成为行名，它就可以按需要工作:
SELECT * FROM crosstab('选择实体，时间，状态，ct从 t4按 1' 订购,'值 (1), (0)')作为 ct (属性"特点，部分"时间戳，状态_1"；整数,状态_0";诠释)；
entity 当然必须是唯一的.
重申
row_name first
(可选)额外列下一个
category(由第二个参数定义)和value last.
从每个 row_name 分区的 first 行填充额外的列.其他行的值将被忽略，每个 row_name 只能填充一列.通常，对于一个 row_name 的每一行，这些都是相同的，但这取决于您.
对于不同的设置在你的回答中:
SELECT localt, entity, msrmnt01, msrmnt02, msrmnt03, msrmnt04, msrmnt05 -- , 更多?从交叉表('SELECT dense_rank() OVER (ORDER BY localt, entity)::int AS row_name, localt, entity -- 附加列, msrmnt, 值从测试-  在哪里  ???-- 而不是最后的 LIMIT按本地、实体、msrmnt 排序-  限制 ???'-- 而不是最后的 LIMIT, $$SELECT generate_series(1,5)$$) -- 更多?AS ct (row_name int, localt timestamp, entity int, msrmnt01 float8, msrmnt02 float8, msrmnt03 float8, msrmnt04 float8, msrmnt05 float8 -- , 更多?)限制 1000 - ??!!
难怪您的测试中的查询执行得非常糟糕.您的测试设置有 14M 行，您处理所有行，然后用 LIMIT 1000 丢弃大部分行.对于减少的结果集，将 WHERE 条件或 LIMIT 添加到源查询！
此外，您使用的阵列在它之上是不必要的昂贵.我改为使用 dense_rank() 生成一个代理行名称.
db<>小提琴这里 -具有更简单的测试设置和更少的行.
Has anyone used tablefunc to pivot on multiple variables as opposed to only using row name? The documentation notes: 

  The "extra" columns are expected to be the same for all rows with the
  same row_name value.
I'm not sure how to do this without combining the columns that I want to pivot on (which I highly doubt will give me the speed I need). One possible way to do this would be to make the entity numeric and add it to the localt as milliseconds, but this seems like a shaky way to proceed.

I've edited the data used in a response to this question: PostgreSQL Crosstab Query.
 CREATE TEMP TABLE t4 (
  timeof   timestamp
 ,entity    character
 ,status    integer
 ,ct        integer);

 INSERT INTO t4 VALUES 
  ('2012-01-01', 'a', 1, 1)
 ,('2012-01-01', 'a', 0, 2)
 ,('2012-01-02', 'b', 1, 3)
 ,('2012-01-02', 'c', 0, 4);

 SELECT * FROM crosstab(
     'SELECT timeof, entity, status, ct
      FROM   t4
      ORDER  BY 1,2,3'
     ,$$VALUES (1::text), (0::text)$$)
 AS ct ("Section" timestamp, "Attribute" character, "1" int, "0" int);
Returns:
 Section                   | Attribute | 1 | 0
---------------------------+-----------+---+---
 2012-01-01 00:00:00       |     a     | 1 | 2
 2012-01-02 00:00:00       |     b     | 3 | 4
So as the documentation states, the extra column aka 'Attribute' is assumed to be the same for each row name aka 'Section'. Thus, it reports b for the second row even though 'entity' also has a 'c' value for that 'timeof' value.

Desired Output:
Section                   | Attribute | 1 | 0
--------------------------+-----------+---+---
2012-01-01 00:00:00       |     a     | 1 | 2
2012-01-02 00:00:00       |     b     | 3 |  
2012-01-02 00:00:00       |     c     |   | 4
Any thoughts or references?

A little more background: I potentially need to do this for billions of rows and I'm testing out storing this data in long and wide formats and seeing if I can use tablefunc to go from long to wide format more efficiently than with regular aggregate functions.

I'll have about 100 measurements made every minute for around 300 entities. Often, we will need to compare the different measurements made for a given second for a given entity, so we will need to go to wide format very often. Also, the measurements made on a particular entity are highly variable.

EDIT: I found a resource on this: http://www.postgresonline.com/journal/categories/24-tablefunc.
 解决方案 
The problem with your query is that b and c share the same timestamp 2012-01-02 00:00:00, and you have the timestamp column timeof first in your query, so - even though you added bold emphasis - b and c are just extra columns that fall in the same group 2012-01-02 00:00:00. Only the first (b) is returned since (quoting the manual):

The row_name column must be first. The category and value columns must be the last two columns, in that order. Any columns between row_name and category are treated as "extra". The "extra" columns are expected to be the same for all rows with the same row_name value.
Bold emphasis mine.

Just revert the order of the first two columns to make entity the row name and it works as desired:
SELECT * FROM crosstab(
      'SELECT entity, timeof, status, ct
       FROM   t4
       ORDER  BY 1'
      ,'VALUES (1), (0)')
 AS ct (
    "Attribute" character
   ,"Section" timestamp
   ,"status_1" int
   ,"status_0" int);
entity must be unique, of course.
Reiterate

row_name first
(optional) extra columns next
category (as defined by the second parameter) and value last.

Extra columns are filled from the first row from each row_name partition. Values from other rows are ignored, there is only one column per row_name to fill. Typically those would be the same for every row of one row_name, but that's up to you.
For the different setup in your answer:
SELECT localt, entity
     , msrmnt01, msrmnt02, msrmnt03, msrmnt04, msrmnt05  -- , more?
FROM   crosstab(
        'SELECT dense_rank() OVER (ORDER BY localt, entity)::int AS row_name
              , localt, entity -- additional columns
              , msrmnt, val
         FROM   test
         -- WHERE  ???   -- instead of LIMIT at the end
         ORDER  BY localt, entity, msrmnt
         -- LIMIT ???'   -- instead of LIMIT at the end
     , $$SELECT generate_series(1,5)$$)  -- more?
     AS ct (row_name int, localt timestamp, entity int
          , msrmnt01 float8, msrmnt02 float8, msrmnt03 float8, msrmnt04 float8, msrmnt05 float8 -- , more?
            )
LIMIT 1000  -- ??!!
No wonder the queries in your test perform terribly. Your test setup has 14M rows and you process all of them before throwing most of it away with LIMIT 1000. For a reduced result set add WHERE conditions or a LIMIT to the source query!
Plus, the array you work with is needlessly expensive on top of it. I generate a surrogate row name with dense_rank() instead.
db<>fiddle here - with a simpler test setup and fewer rows.

                        这篇关于使用 Tablefunc 旋转多列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

使用 Tablefunc 旋转多列 [英] Pivot on Multiple Columns using Tablefunc

问题描述

重申

对于不同的设置在你的回答中:

Reiterate

For the different setup in your answer:

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Tablefunc 旋转多列 [英] Pivot on Multiple Columns using Tablefunc

问题描述

重申

对于不同的设置在你的回答中:

Reiterate

For the different setup in your answer:

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭