Redshift将单个动态列拆分为新表中的多行 [英] Redshift split single dynamic column into multiple rows in new table
问题描述
使用如下表:
uid | segmentids
-------------------------+----------------------------------------
f9b6d54b-c646-4bbb-b0ec | 4454918|4455158|4455638|4455878|4455998
asd7a0s9-c646-asd7-b0ec | 1265899|1265923|1265935|1266826|1266596
gd3355ff-cjr8-assa-fke0 | 2237557|2237581|2237593
laksnfo3-kgi5-fke0-b0ec | 4454918|4455158|4455638|4455878
如何使用以下方法创建新表
How to create a new table with:
uid | segmentids
-------------------------+---------------------------
f9b6d54b-c646-4bbb-b0ec | 4454918
f9b6d54b-c646-4bbb-b0ec | 1265899
f9b6d54b-c646-4bbb-b0ec | 2237557
f9b6d54b-c646-4bbb-b0ec | 4454918
f9b6d54b-c646-4bbb-b0ec | 4454918
asd7a0s9-c646-asd7-b0ec | 1265899
asd7a0s9-c646-asd7-b0ec | 1265923
asd7a0s9-c646-asd7-b0ec | 1265935
asd7a0s9-c646-asd7-b0ec | 1266826
asd7a0s9-c646-asd7-b0ec | 1266596
段的数量是动态的,可以随每个记录而变化. 我尝试使用带分隔符的Split函数,但是它需要字符串中的索引,在这里是动态的.
The number of segments are dynamic, can vary with each record. I tried the Split function with delimiter, but it requires the index in string, which is dynamic here.
有什么建议吗?
推荐答案
这是Redshift答案,它将与每行多达1万个细分ID值一起使用.
Here is the Redshift answer, it will work with up to 10 thousand segment ids values per row.
测试数据
create table test_split (uid varchar(50),segmentids varchar(max));
insert into test_split
values
('f9b6d54b-c646-4bbb-b0ec','4454918|4455158|4455638|4455878|4455998'),
('asd7a0s9-c646-asd7-b0ec','1265899|1265923|1265935|1266826|1266596'),
('asd7345s9-c646-asd7-b0ec','1235935|1263456|1265675696'),
('as345a0s9-c646-asd7-b0ec','12765899|12658883|12777935|144466826|1266226|12345')
;
代码
with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
, generted_numbers AS
(
SELECT (1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
FROM ten_numbers AS t1
JOIN ten_numbers AS t2 ON 1 = 1
JOIN ten_numbers AS t3 ON 1 = 1
JOIN ten_numbers AS t4 ON 1 = 1
)
, splitter AS
(
SELECT *
FROM generted_numbers
WHERE gen_num BETWEEN 1 AND (SELECT max(REGEXP_COUNT(segmentids, '\\|') + 1)
FROM test_split)
)
--select * from splitter;
, expanded_input AS
(
SELECT
uid,
split_part(segmentids, '|', s.gen_num) AS segment
FROM test_split AS ts
JOIN splitter AS s ON 1 = 1
WHERE split_part(segmentids, '|', s.gen_num) <> ''
)
SELECT * FROM expanded_input;
前2个cte步骤(ten_numbers和generate_numbers)用于生成许多行,这是必需的,因为不支持generate_series
the first 2 cte steps (ten_numbers and generated_numbers) are used to generate a number of rows, this is needed because generate_series is not supported
下一步(分隔符)只需要等于最大分隔符+ 1(即段的最大数目)的行数
The next step (splitter) just takes a number of rows equal to the max number of delimiters + 1 (which is the max number of segments)
最后,我们将输入数据与交叉分割器交叉,使用split_part获取相关值,然后排除空白部分(这是因为该行的段数小于或等于最大段数)
finally, we cross join splitter with the input data, take the related value using split_part and then exclude blank parts (which are caused where the row has < the max number of segments)
这篇关于Redshift将单个动态列拆分为新表中的多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!