AWS Redshift数据透视表所有维度 [英] AWS Redshift Pivot Table all Dimensions

查看:172
本文介绍了AWS Redshift数据透视表所有维度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在遵循在redshift中枢转大型表的方法:

使用Amazon RedShift/PostgreSQL旋转表

但是,我有大量的组要进行枢转,即m1m2,... 如何遍历所有不同的值,并对每个值应用相同的逻辑,并对结果的列名进行别名命名?

解决方案

如果您希望能够转到任意数量的组,则可以将组合并为JSON字符串,然后使用 Redshift JSON函数.您可能不想对非常大的数据集执行此操作.

这是基于上面链接的问题中的示例数据的基本思想:

select DimensionA, DimensionB,
    json_extract_path_text(json_pivot, 'm1') m1,
    json_extract_path_text(json_pivot, 'm2') m2
from (
    select DimensionA, DimensionB,
        '{' || listagg(quote_ident(MetricName) || ':' || quote_ident(MetricValue), ',')
               within group (order by MetricName) || '}' as json_pivot
    from to_pivot
    group by DimensionA, DimensionB
)

实际上,您不希望那样运行它.内部选择是用于生成数据透视表"的内容,外部选择显示了如何引用特定的组值.

这不考虑相同暗淡组合的重复组记录,如下所示:

DimensionA  DimensionB  MetricName  MetricValue
----------  ----------  ----------  -----------
dimA1       dimB2       m1          v13
dimA1       dimB2       m1          v23

如果数据中有这种可能,那么您将必须弄清楚该如何处理.我不确定它会如何实现.我的猜测是第一次出现会被提取.

这可能是结合使用 LISTAGG 和 REGEXP_SUBSTR 以及两个自定义分隔符.

varchar(max)用于 JSON列类型将提供65535个字节,应该可以容纳数千个类别.

此处稍有不同.

I am following the method to pivot a large table in redshift:

Pivot a table with Amazon RedShift / PostgreSQL

However I have a large number of groups to pivot ie, m1, m2, ... How can I loop through all distinct values and apply the same logic to each of them and alias the resulting column names?

解决方案

If you want to be able to pivot to arbitrary numbers of groups you can combine the groups into a JSON string and then extract the groups you are interested in with the Redshift JSON functions. You probably do not want to do this for very large data sets.

Here is the basic idea based on the sample data in the question linked above:

select DimensionA, DimensionB,
    json_extract_path_text(json_pivot, 'm1') m1,
    json_extract_path_text(json_pivot, 'm2') m2
from (
    select DimensionA, DimensionB,
        '{' || listagg(quote_ident(MetricName) || ':' || quote_ident(MetricValue), ',')
               within group (order by MetricName) || '}' as json_pivot
    from to_pivot
    group by DimensionA, DimensionB
)

In practice you would not want to run it like that. The inner select is what you would use to generate your "pivoted" table, and the outer select shows how to reference specific group values.

This does not account for duplicate group records for the same dim combination like the following:

DimensionA  DimensionB  MetricName  MetricValue
----------  ----------  ----------  -----------
dimA1       dimB2       m1          v13
dimA1       dimB2       m1          v23

If that is a possibility in the data then you will have to figure out how to handle that. I am not sure how it would behave as implemented. My guess is the first occurrence would be extracted.

This could probably be done using a combination of LISTAGG and REGEXP_SUBSTR as well using two custom delimiters.

Using varchar(max) for the JSON column type will give 65535 bytes which should be room for a couple thousand categories.

Explained slightly differently here.

这篇关于AWS Redshift数据透视表所有维度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆