红移.将逗号分隔的值转换为行 [英] Redshift. Convert comma delimited values into rows

查看:97
本文介绍了红移.将逗号分隔的值转换为行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道如何在Redshift中将以逗号分隔的值转换为行.恐怕我自己的解决方案不是最佳的.请指教.我有带有逗号分隔值的列之一的表.例如:

I am wondering how to convert comma-delimited values into rows in Redshift. I am afraid that my own solution isn't optimal. Please advise. I have table with one of the columns with coma-separated values. For example:

我有:

user_id|user_name|user_action
-----------------------------
1      | Shone   | start,stop,cancell...

我想看

user_id|user_name|parsed_action 
------------------------------- 
1      | Shone   | start        
1      | Shone   | stop         
1      | Shone   | cancell      
....

推荐答案

对现有答案的一点改进是使用第二个数字"表,该表枚举了所有可能的列表长度,然后使用cross join进行查询更加紧凑.

A slight improvement over the existing answer is to use a second "numbers" table that enumerates all of the possible list lengths and then use a cross join to make the query more compact.

我知道,Redshift没有创建数字表的简单方法,但是我们可以使用

Redshift does not have a straightforward method for creating a numbers table that I am aware of, but we can use a bit of a hack from https://www.periscope.io/blog/generate-series-in-redshift-and-mysql.html to create one using row numbers.

具体来说,如果我们假设cmd_logs中的行数大于user_action列中的最大逗号数,则可以通过对行进行计数来创建数字表.首先,假设user_action列中最多有99个逗号:

Specifically, if we assume the number of rows in cmd_logs is larger than the maximum number of commas in the user_action column, we can create a numbers table by counting rows. To start, let's assume there are at most 99 commas in the user_action column:

select 
  (row_number() over (order by true))::int as n
into numbers
from cmd_logs
limit 100;

如果想花哨的话,我们可以从cmd_logs表中计算逗号的数量,以在numbers中创建更精确的行集:

If we want to get fancy, we can compute the number of commas from the cmd_logs table to create a more precise set of rows in numbers:

select
  n::int
into numbers
from
  (select 
      row_number() over (order by true) as n
   from cmd_logs)
cross join
  (select 
      max(regexp_count(user_action, '[,]')) as max_num 
   from cmd_logs)
where
  n <= max_num + 1;

一旦有一个numbers表,我们就可以做到:

Once there is a numbers table, we can do:

select
  user_id, 
  user_name, 
  split_part(user_action,',',n) as parsed_action 
from
  cmd_logs
cross join
  numbers
where
  split_part(user_action,',',n) is not null
  and split_part(user_action,',',n) != '';

这篇关于红移.将逗号分隔的值转换为行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆