Redshift 将单个动态列拆分为新表中的多行 [英] Redshift split single dynamic column into multiple rows in new table

查看:45
本文介绍了Redshift 将单个动态列拆分为新表中的多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用如下表格:

       uid               |            segmentids
-------------------------+----------------------------------------
 f9b6d54b-c646-4bbb-b0ec | 4454918|4455158|4455638|4455878|4455998
 asd7a0s9-c646-asd7-b0ec | 1265899|1265923|1265935|1266826|1266596
 gd3355ff-cjr8-assa-fke0 | 2237557|2237581|2237593
 laksnfo3-kgi5-fke0-b0ec | 4454918|4455158|4455638|4455878

如何创建一个新表:

      uid               |         segmentids
-------------------------+---------------------------
 f9b6d54b-c646-4bbb-b0ec |           4454918
 f9b6d54b-c646-4bbb-b0ec |           1265899
 f9b6d54b-c646-4bbb-b0ec |           2237557
 f9b6d54b-c646-4bbb-b0ec |           4454918
 f9b6d54b-c646-4bbb-b0ec |           4454918
 asd7a0s9-c646-asd7-b0ec |           1265899
 asd7a0s9-c646-asd7-b0ec |           1265923
 asd7a0s9-c646-asd7-b0ec |           1265935 
 asd7a0s9-c646-asd7-b0ec |           1266826
 asd7a0s9-c646-asd7-b0ec |           1266596

段数是动态的,可以随每条记录而变化.我尝试了带分隔符的 Split 函数,但它需要字符串中的索引,这里是动态的.

The number of segments are dynamic, can vary with each record. I tried the Split function with delimiter, but it requires the index in string, which is dynamic here.

有什么建议吗?

推荐答案

这是 Redshift 的答案,它可以处理每行多达 10,000 个段 ID 值.

Here is the Redshift answer, it will work with up to 10 thousand segment ids values per row.

测试数据

create table test_split (uid varchar(50),segmentids varchar(max));
insert into test_split
    values
      ('f9b6d54b-c646-4bbb-b0ec','4454918|4455158|4455638|4455878|4455998'),
      ('asd7a0s9-c646-asd7-b0ec','1265899|1265923|1265935|1266826|1266596'),
      ('asd7345s9-c646-asd7-b0ec','1235935|1263456|1265675696'),
      ('as345a0s9-c646-asd7-b0ec','12765899|12658883|12777935|144466826|1266226|12345')
;

代码

with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
  , generted_numbers AS
(
    SELECT (1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
    FROM ten_numbers AS t1
      JOIN ten_numbers AS t2 ON 1 = 1
      JOIN ten_numbers AS t3 ON 1 = 1
      JOIN ten_numbers AS t4 ON 1 = 1
)
  , splitter AS
(
    SELECT *
    FROM generted_numbers
    WHERE gen_num BETWEEN 1 AND (SELECT max(REGEXP_COUNT(segmentids, '\|') + 1)
                                 FROM test_split)
)
  --select * from splitter;
  , expanded_input AS
(
    SELECT
      uid,
      split_part(segmentids, '|', s.gen_num) AS segment
    FROM test_split AS ts
      JOIN splitter AS s ON 1 = 1
    WHERE split_part(segmentids, '|', s.gen_num) <> ''
)
SELECT * FROM expanded_input;

前两个 cte 步骤(ten_numbers 和 generated_numbers)用于生成行数,这是必需的,因为不支持 generate_series

the first 2 cte steps (ten_numbers and generated_numbers) are used to generate a number of rows, this is needed because generate_series is not supported

下一步(拆分器)只需要行数等于最大分隔符数 + 1(这是最大段数)

The next step (splitter) just takes a number of rows equal to the max number of delimiters + 1 (which is the max number of segments)

最后,我们将 splitter 与输入数据交叉连接,使用 split_part 获取相关值,然后排除空白部分(这是由于行具有 <最大段数)

finally, we cross join splitter with the input data, take the related value using split_part and then exclude blank parts (which are caused where the row has < the max number of segments)

这篇关于Redshift 将单个动态列拆分为新表中的多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆