使用 bigquery 将字符串拆分为多列 [英] Split string into multiple columns with bigquery

查看:56
本文介绍了使用 bigquery 将字符串拆分为多列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 BigQuery 中有一个包含数百万行的表,我想将 adx_catg_id 列拆分为多个新列.请注意,adx_catg_id 列包含由空格分隔的任意数量的单词.

I have a table in BigQuery with millions of rows, and I want to split adx_catg_id column to multiple new columns. Please note that the adx_catg_id column contains an arbitrary number of words separated by space.

如果字符串仅包含少于五个单词,则下面的查询示例可以将 adx_catg_id 拆分为多列.我可以扩展它以支持更多的单词,但我需要自动化.

This example of Query below can split the adx_catg_id to multiple columns if the string contains only less than five words. I can extend it to support more number of words, but I need to automate it.

SELECT
  TS, str0, str2, str4, str6, str7
  from
  (select REGEXP_EXTRACT(str5, r'^(.*) .*') as str7
  from
  (select SUBSTR (str5, LENGTH(REGEXP_EXTRACT(str5, r'^(.*) .*')) + 2, LENGTH(str5)) as str6
  from
  (select REGEXP_EXTRACT(str3, r'^(.*) .*') as str5
  from
  (select SUBSTR (str3, LENGTH(REGEXP_EXTRACT(str3, r'^(.*) .*')) + 2, LENGTH(str3)) as str4
  from
  (select REGEXP_EXTRACT(str1, r'^(.*) .*') as str3
  from
  (select SUBSTR (str1, LENGTH(REGEXP_EXTRACT(str1, r'^(.*) .*')) + 2, LENGTH(str1)) as str2
  from
  (select REGEXP_EXTRACT(TS, r'^(.*) .*') as str1
  from
  (select SUBSTR(TS, LENGTH(REGEXP_EXTRACT(TS, r'^(.*) .*')) + 2,LENGTH(TS)) as str0
  from 
  (select adx_catg_id TS from [mydataset.conversions])
  ))))))))

如何循环上述查询以根据字符串长度生成新列中的所有单词?

How can I loop the above query to generate all words in new columns depending of string length?

推荐答案

看看这个...

SELECT  
Regexp_extract(StringToParse,r'^(?:[^s]*s){0}([^s]*)s?') as Word0,
Regexp_extract(StringToParse,r'^(?:[^s]*s){1}([^s]*)s?') as Word1,
Regexp_extract(StringToParse,r'^(?:[^s]*s){2}([^s]*)s?') as Word2,
Regexp_extract(StringToParse,r'^(?:[^s]*s){3}([^s]*)s?') as Word3,
Regexp_extract(StringToParse,r'^(?:[^s]*s){4}([^s]*)s?') as Word4,
Regexp_extract(StringToParse,r'^(?:[^s]*s){5}([^s]*)s?') as Word5,
Regexp_extract(StringToParse,r'^(?:[^s]*s){6}([^s]*)s?') as Word6, 
Regexp_extract(StringToParse,r'^(?:[^s]*s){7}([^s]*)s?') as Word7,
Regexp_extract(StringToParse,r'^(?:[^s]*s){8}([^s]*)s?') as Word8,
Regexp_extract(StringToParse,r'^(?:[^s]*s){9}([^s]*)s?') as Word9,
Regexp_extract(StringToParse,r'^(?:[^s]*s){10}([^s]*)s?') as Word10,
Regexp_extract(StringToParse,r'^(?:[^s]*s){11}([^s]*)s?') as Word11,
Regexp_extract(StringToParse,r'^(?:[^s]*s){12}([^s]*)s?') as Word12,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)

或者,如果您想要以相反的顺序:

Or if you want it in reverse order:

SELECT  
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){1}$') as Word1,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){2}$') as Word2,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){3}$') as Word3,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){4}$') as Word4,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){5}$') as Word5,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){6}$') as Word6, 
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){7}$') as Word7,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)

它仍然是固定数量的字段,但编码更简单且更具可读性.

Its still a fixed number of fields, but coding is simpler and more readable.

希望能帮到你

这篇关于使用 bigquery 将字符串拆分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆