如何在Redshift中使用两个不同的定界符从两列中拆分数据? [英] How to split data from two columns using two different delimiters in Redshift?

查看:141
本文介绍了如何在Redshift中使用两个不同的定界符从两列中拆分数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 CTE ,其中包含这样的数据.它几乎遵循两种格式,其中 counts process_ids 将具有这两种类型的数据.

I have a CTE that has data like this. It follows two formats pretty much where counts and process_ids will have these two types of data.

client_id      day              counts      process_ids
--------------------------------------------------------------------------------------------
abc1          Feb-01-2021        3        C1,C2 | C3,C4,C5 | C6,C7
abc2          Feb-05-2021       2, 3      C10,C11,C12 | C13,C14 # C15,C16 | C17,C18

现在,在按 counts process_ids -

client_id      day              counts      process_ids
--------------------------------------------------------
abc1           Feb-01-2021        3           C1
abc1           Feb-01-2021        3           C2
abc1           Feb-01-2021        3           C3
abc1           Feb-01-2021        3           C4
abc1           Feb-01-2021        3           C5
abc1           Feb-01-2021        3           C6
abc1           Feb-01-2021        3           C7
abc2           Feb-05-2021        2           C10
abc2           Feb-05-2021        2           C11
abc2           Feb-05-2021        2           C12
abc2           Feb-05-2021        2           C13
abc2           Feb-05-2021        2           C14
abc2           Feb-05-2021        3           C15
abc2           Feb-05-2021        3           C16
abc2           Feb-05-2021        3           C17
abc2           Feb-05-2021        3           C18

基本上,这个想法是根据以下两种使用情况(如果采用任何一种格式)来划分 counts process_ids .

Basically, the idea is to split counts and process_ids basis on the below two use cases if they follow any of those formats.

用例1

如果 counts 列仅包含一位数字,而 process_ids 列具有 | 分隔符.

If counts column only has single-digit and process_ids column has | delimiter.

用例2

如果 counts 列仅用两位数字分隔,并由分隔符和 process_ids 列具有#分隔符以及 pipe .

If counts column only has two-digit separated by a , delimiter and process_ids column has # delimiter along with pipe.

我在这里与 Amazon Redshift 一起工作,我对如何根据需要将它们分开感到困惑.

I am working with Amazon Redshift here and I am confused about how can I split them out as needed.

这有可能吗?

推荐答案

乍一看,这似乎有些毛茸茸,但它是通过扎实的技术构建而成的,并提供了理想的结果...

This might look a bit hairy at first sight but has been built up from solid techniques and gives the desired result...

WITH seq_0_9 AS (
  SELECT 0 AS d
  UNION ALL SELECT 1 AS d
  UNION ALL SELECT 2 AS d
  UNION ALL SELECT 3 AS d
  UNION ALL SELECT 4 AS d
  UNION ALL SELECT 5 AS d
  UNION ALL SELECT 6 AS d
  UNION ALL SELECT 7 AS d
  UNION ALL SELECT 8 AS d
  UNION ALL SELECT 9 AS d
),
numbers AS (
  SELECT a.d + b.d * 10 + c.d * 100 + 1 AS n
  FROM seq_0_9 a, seq_0_9 b, seq_0_9 c
),
processed AS
  (SELECT client_id,
          day,
          REPLACE(counts, ' ', '') AS counts,
          REPLACE(REPLACE(process_ids, ' ', ''), '|', ',') AS process_ids
   FROM tbl),
split_pids AS
  (SELECT
     client_id, 
     day,
     counts,
     split_part(process_ids, '#', n) AS process_ids,
     n AS n1
   FROM processed
   CROSS JOIN numbers
   WHERE 
     split_part(process_ids, '#', n) IS NOT NULL
     AND split_part(process_ids, '#', n) != ''),
split_counts AS
  (SELECT
     client_id, 
     day,
     split_part(counts, ',', n) AS counts,
     process_ids,
     n1,
     n AS n2
   FROM split_pids
   CROSS JOIN numbers
   WHERE
     split_part(counts, ',', n) IS NOT NULL
     and split_part(counts, ',', n) != ''),
matched_up AS
  (SELECT * FROM split_counts WHERE n1 = n2)
SELECT
  client_id, 
  day,
  counts,
  split_part(process_ids, ',', n) AS process_ids
FROM
  matched_up
CROSS JOIN
  numbers
WHERE
  split_part(process_ids, ',', n) IS NOT NULL
  AND split_part(process_ids, ',', n) != '';

演示

在线rextester演示(使用PostgreSQL,但应与Redshift兼容): https://rextester.com/FNA16497

此技术用于生成数字表(范围从1到1000).然后,此技术与多个

This technique is used to generate a numbers table (from 1 to 1000 inclusive). This technique is then used multiple times with multiple Common Table Expressions to achieve it in a single SQL statement.

这篇关于如何在Redshift中使用两个不同的定界符从两列中拆分数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆