数千列的动态枢轴 [英] Dynamic pivot for thousands of columns

查看:77
本文介绍了数千列的动态枢轴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用pgAdmin III/PostgreSQL 9.4来存储和处理我的数据.我当前数据的示例:

I'm using pgAdmin III / PostgreSQL 9.4 to store and work with my data. Sample of my current data:

x | y
--+--
0 | 1
1 | 1
2 | 1
5 | 2
5 | 2
2 | 2
4 | 3
6 | 3
2 | 3

我希望它如何格式化:

1, 2, 3-列名是唯一的y
0, 5, 4-相应的第一个x
1, 5, 6-第二个相应的x
2, 2, 2-等等

1, 2, 3 -- column names are unique y values
0, 5, 4 -- the first respective x values
1, 5, 6 -- the second respective x values
2, 2, 2 -- etc.

它必须是动态的,因为我有y的数百万行和数千个唯一值.

It would need to be dynamic because I have millions of rows and thousands of unique values for y.

使用动态枢轴方法是否正确?我无法成功实现此目标:

Is using a dynamic pivot approach correct for this? I have not been able to successfully implement this:

DECLARE @columns VARCHAR(8000)

SELECT @columns = COALESCE(@columns + ',[' + cast(y as varchar) + ']',
'[' + cast(y as varchar)+ ']')
FROM tableName
GROUP BY y

DECLARE @query VARCHAR(8000)

SET @query = '
SELECT x
FROM tableName
PIVOT
(
MAX(x)
FOR [y]
IN (' + @columns + ')
)
AS p'

EXECUTE(@query)

它在第一行停止并给出错误:

It is stopping on the first line and giving the error:

syntax error at or near "@"

我所见过的所有动态数据透视示例都使用此示例,因此我不确定自己做错了什么.任何帮助表示赞赏.谢谢您的时间.

All dynamic pivot examples I've seen use this, so I'm not sure what I've done wrong. Any help is appreciated. Thank you for your time.

**注:重要的是,x值以正确的顺序存储,因为顺序很重要.如有必要,我可以添加另一列以指示顺序.

**Note: It is important for the x values to be stored in the correct order, as sequence matters. I can add another column to indicate sequential order if necessary.

推荐答案

第一 ,当您说第一行"时,您假设行的自然顺序是在数据库表中不存在.因此,是的,您需要像已经怀疑的那样add another column to indicate sequential order.我为此目的假设一列 tbl_id . -除非您要默认为穷人的最后一个(不可靠的)度假胜地,否则: ctid

First, when you say "first row", you assume a natural order of rows, which does not exist in database tables. So, yes, you need to add another column to indicate sequential order like you already suspected. I am assuming a column tbl_id for the purpose. - Unless you want to default to the poor man's last (unreliable) resort: ctid

下一步 ,您显示的代码类似于MS SQL Server代码,对于Postgres完全无效.

Next, the code you present looks like MS SQL Server code, which is completely invalid for Postgres.

最后 ,对于millions of rows and thousands of unique values for Y,尝试返回单个列甚至都没有意义. Postgres有很大的限制,但还不够慷慨.引用 Postgres关于" :

Finally, for millions of rows and thousands of unique values for Y it wouldn't even make sense to try and return individual columns. Postgres has generous limits, but not nearly generous enough for that. Quoting Postgres "About":

每个表的最大列数250-1600,具体取决于列类型

Maximum Columns per Table 250 - 1600 depending on column types

因此,我们甚至没有机会讨论SQL的限制性特征,这种限制性特征要求知道列及其数据类型在执行时就不能在执行期间动态调整.因此,您需要两个单独的通话,就像我们在相关问题下进行的详细讨论一样.

So we don't even get the chance to discuss the restrictive characteristics of SQL, which demands to know columns and their data types at execution time, not dynamically adjusted during execution. So yu would need two separate calls, like we discussed in great detail under this related question.

您还将在同一问题下找到@Clodoaldo的替代返回数组.实际上,它可以是 完全动态 .这也是我在这里建议的. 查询实际上非常简单:

You'll also find an alternative returning arrays by @Clodoaldo under the same question. That can actually be completely dynamic. And that's what I suggest here, too. The query is actually rather simple:

WITH cte AS (
   SELECT *, row_number() OVER (PARTITION BY y ORDER BY tbl_id) AS rn
   FROM   tbl
   ORDER  BY y, tbl_id
   )
SELECT text 'y' AS col, array_agg (y) AS values
FROM   cte
WHERE  rn = 1

UNION ALL
(  -- parentheses required
SELECT text 'x' || rn, array_agg (x)
FROM   cte
GROUP  BY rn
ORDER  BY rn
);

结果:

col | values
----+--------
y   | {1,2,3}
x1  | {0,5,4}
x2  | {1,5,6}
x3  | {2,2,2}

SQL提琴.

  • CTE为每组y的每一行(每个x)计算row_number rn.我们将使用它两次,因此使用了CTE.

  • The CTE computes a row_number rn for each row (each x) per group of y. We are going to use it twice, hence the CTE.

外部查询中的第一个SELECT生成y值的数组.

The 1st SELECT in the outer query generates the array of y values.

外部查询中的第二个SELECT会按顺序生成所有x值的数组.数组可以具有不同的长度.

The 2nd SELECT in the outer query generates all arrays of x values in order. Arrays can have different length.

为什么要为UNION ALL加上括号?

这篇关于数千列的动态枢轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆