使用 bigquery 将字符串拆分为多列 [英] Split string into multiple columns with bigquery
问题描述
我在 BigQuery 中有一个包含数百万行的表,我想将 adx_catg_id 列拆分为多个新列.请注意,adx_catg_id 列包含由空格分隔的任意数量的单词.
I have a table in BigQuery with millions of rows, and I want to split adx_catg_id column to multiple new columns. Please note that the adx_catg_id column contains an arbitrary number of words separated by space.
如果字符串仅包含少于五个单词,则下面的查询示例可以将 adx_catg_id 拆分为多列.我可以扩展它以支持更多的单词,但我需要自动化.
This example of Query below can split the adx_catg_id to multiple columns if the string contains only less than five words. I can extend it to support more number of words, but I need to automate it.
SELECT
TS, str0, str2, str4, str6, str7
from
(select REGEXP_EXTRACT(str5, r'^(.*) .*') as str7
from
(select SUBSTR (str5, LENGTH(REGEXP_EXTRACT(str5, r'^(.*) .*')) + 2, LENGTH(str5)) as str6
from
(select REGEXP_EXTRACT(str3, r'^(.*) .*') as str5
from
(select SUBSTR (str3, LENGTH(REGEXP_EXTRACT(str3, r'^(.*) .*')) + 2, LENGTH(str3)) as str4
from
(select REGEXP_EXTRACT(str1, r'^(.*) .*') as str3
from
(select SUBSTR (str1, LENGTH(REGEXP_EXTRACT(str1, r'^(.*) .*')) + 2, LENGTH(str1)) as str2
from
(select REGEXP_EXTRACT(TS, r'^(.*) .*') as str1
from
(select SUBSTR(TS, LENGTH(REGEXP_EXTRACT(TS, r'^(.*) .*')) + 2,LENGTH(TS)) as str0
from
(select adx_catg_id TS from [mydataset.conversions])
))))))))
如何循环上述查询以根据字符串长度生成新列中的所有单词?
How can I loop the above query to generate all words in new columns depending of string length?
推荐答案
看看这个...
SELECT
Regexp_extract(StringToParse,r'^(?:[^s]*s){0}([^s]*)s?') as Word0,
Regexp_extract(StringToParse,r'^(?:[^s]*s){1}([^s]*)s?') as Word1,
Regexp_extract(StringToParse,r'^(?:[^s]*s){2}([^s]*)s?') as Word2,
Regexp_extract(StringToParse,r'^(?:[^s]*s){3}([^s]*)s?') as Word3,
Regexp_extract(StringToParse,r'^(?:[^s]*s){4}([^s]*)s?') as Word4,
Regexp_extract(StringToParse,r'^(?:[^s]*s){5}([^s]*)s?') as Word5,
Regexp_extract(StringToParse,r'^(?:[^s]*s){6}([^s]*)s?') as Word6,
Regexp_extract(StringToParse,r'^(?:[^s]*s){7}([^s]*)s?') as Word7,
Regexp_extract(StringToParse,r'^(?:[^s]*s){8}([^s]*)s?') as Word8,
Regexp_extract(StringToParse,r'^(?:[^s]*s){9}([^s]*)s?') as Word9,
Regexp_extract(StringToParse,r'^(?:[^s]*s){10}([^s]*)s?') as Word10,
Regexp_extract(StringToParse,r'^(?:[^s]*s){11}([^s]*)s?') as Word11,
Regexp_extract(StringToParse,r'^(?:[^s]*s){12}([^s]*)s?') as Word12,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)
或者,如果您想要以相反的顺序:
Or if you want it in reverse order:
SELECT
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){1}$') as Word1,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){2}$') as Word2,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){3}$') as Word3,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){4}$') as Word4,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){5}$') as Word5,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){6}$') as Word6,
Regexp_extract(StringToParse,r's?([^s]*)(?:[^s]*s?){7}$') as Word7,
FROM
(SELECT 'arbitrary number of words separated by space.' as StringToParse)
它仍然是固定数量的字段,但编码更简单且更具可读性.
Its still a fixed number of fields, but coding is simpler and more readable.
希望能帮到你
这篇关于使用 bigquery 将字符串拆分为多列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!