在RedShift中将值拆分为多行 [英] Split values over multiple rows in RedShift

查看:433
本文介绍了在RedShift中将值拆分为多行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有关如何将字段(例如CSV字符串)拆分为多行的问题已得到解答: 将值拆分为多行.

The question of how to split a field (e.g. a CSV string) into multiple rows has already been answered: Split values over multiple rows.

但是,此问题涉及MSSQL,答案使用的是与RedShift等效的各种功能.

However, this question refers to MSSQL, and the answers use various features for which there are no RedShift equivalents.

为了完整起见,下面是我想做的事的一个例子:

For the sake of completeness, here's an example of what I'd like to do:

当前数据:

| Key | Data     |
+-----+----------+
| 1   | 18,20,22 |
| 2   | 17,19    |

所需数据:

| Key | Data     |
+-----+----------+
| 1   | 18       |
| 1   | 20       |
| 1   | 22       |
| 2   | 17       |
| 2   | 19       |

现在,对于在CSV字段中元素数量有限的情况,我可以建议一种解决方法:对所有可能的数组位置使用split_part和union,如下所示:

Now, I can suggest a walkaround for the case of small, bounded number of elements in the CSV field: use split_part and union over all possible array locations, like so:

SELECT Key, split_part(Data, ',', 1) 
FROM mytable
WHERE split_part(Data, ',', 1) != ""
    UNION
SELECT Key, split_part(Data, ',', 2) 
FROM mytable
WHERE split_part(Data, ',', 2) != ""
-- etc. etc.

但是,这显然效率很低,并且不适用于较长的列表.

However, this is obviously very inefficient, and would not work for longer lists.

关于如何执行此操作的任何更好的主意?

Any better ideas on how to do this?

关于乘法行,还有一个类似的问题:在Redshift中拆分行.但是,我不知道如何在这里应用这种方法.

There's also a somewhat similar question regarding multiplying rows: splitting rows in Redshift. However I don't see how this approach can be applied here.

可能的重复项: Redshift.将逗号分隔的值转换为行.但是没有什么新鲜的-宫崎骏(Masashi Miyazaki)的答案与我上面的建议类似,并且存在相同的问题.

A possible duplicate: Redshift. Convert comma delimited values into rows. But nothing new - the answer by @Masashi Miyazaki is similar to my suggestion above, and suffers from the same issues.

推荐答案

这是Redshift答案,它每行最多可处理1万个值.

Here is the Redshift answer, it will work with up to 10 thousand values per row.

设置测试数据

create table test_data (key varchar(50),data varchar(max));
insert into test_data
    values
      (1,'18,20,22'),
      (2,'17,19')
;

代码

with ten_numbers as (select 1 as num union select 2 union select 3 union select 4 union select 5 union select 6 union select 7 union select 8 union select 9 union select 0)
  , generted_numbers AS
(
    SELECT (1000 * t1.num) + (100 * t2.num) + (10 * t3.num) + t4.num AS gen_num
    FROM ten_numbers AS t1
      JOIN ten_numbers AS t2 ON 1 = 1
      JOIN ten_numbers AS t3 ON 1 = 1
      JOIN ten_numbers AS t4 ON 1 = 1
)
  , splitter AS
(
    SELECT *
    FROM generted_numbers
    WHERE gen_num BETWEEN 1 AND (SELECT max(REGEXP_COUNT(data, '\\,') + 1)
                                 FROM test_data)
)
  , expanded_input AS
(
    SELECT
      key,
      split_part(data, ',', s.gen_num) AS data
    FROM test_data AS td
      JOIN splitter AS s ON 1 = 1
    WHERE split_part(data, ',', s.gen_num) <> ''
)
SELECT * FROM expanded_input
order by key,data;

这篇关于在RedShift中将值拆分为多行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆