在Google bigquery中转换数据 - 提取文本,将其拆分为多个列并旋转数据 [英] Transform data in Google bigquery - extract text, split it into multiple columns and pivoting the data

查看:402
本文介绍了在Google bigquery中转换数据 - 提取文本,将其拆分为多个列并旋转数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些大型查询中的博客数据,我需要进行转换以使其更易于使用和查询。数据如下所示:



>

我想在结果{... ..} (彩色蓝色)之后提取并转换数据内的曲线括号。数据的形式为'(\d +((PQ)|(KL))+ \ d +)',结果数组中可以有1-20 +条目。我只对前16个条目感兴趣。

我已经能够使用Substr和regext_extract将卷曲括号内的数据提取到新列中。但我无法将它分成列(有时只有1个结果,所以分隔符,缺失。我是新的正则表达式,可能是我可以使用'(\ d +((PQ) |(KL))+ \d +) {1}'等将数据拆分为多列,然后将其转换为数据。



在我的情况下,理想的输出是将其转换为如下形式:



我不完全确定是否有可能在大查询中做到这一点,如果有人能够在这里帮助我一点,我将不胜感激。



如果这是不可能的,那么对于结果数组中少于16个条目的情况,我可以在Event_details中为每个具有NULL值的事件创建16行。


$ b

如果这两个都不可能,t他最后的解决方案是将其转化为如下形式:



我想转换数据的原因是,在大多数情况下,我需要查找哪些结果数组项出现以及按什么顺序出现。

解决方案

检查了这一点:将字符串拆分成与查询的多个列
在他们的情况下,它由空格分隔。用','替换\s



类似于:

  SELECT 
Regexp_extract(StringToParse,r'^ * {(?:[^,] *,){0}(\d +(?:(?:PQ)|(?:KL))+ \ (*:[^,] *,){1}(\ d +(?:(?:PQ)|) (?:[^,] *,){2}(\ d +)\ s?')作为Word1,
Regexp_extract(StringToParse, (?:(?:PQ)|(?:KL))+ \ d +)\ s?')作为Word2,
Regexp_extract(StringToParse,r'^ * {(?:[^,] * ,){3}(\d +(?:( ?: PQ)|(?:KL))+ \ d +)\ s?')as Word3,
FROM
(SELECT' bla {1234PQ5,6789KL0,1234PQ5,6789KL0,123'as StringToParse)


I have some weblog data in big query which I need to transform to make it easier to use and query. The data looks like:

I want to extract and transform the data within the curled brackets after Results{…..} (colored blue). The data is of the form ‘(\d+((PQ)|(KL))+\d+)’ and there can be 1-20+ entries in the result array. I am only interested in the first 16 entries.

I have been able to extract the data within curled brackets into a new column, using Substr and regext_extract. But I'm unable to SPLIT it into columns (sometimes there is only 1 result and so the delimiter "," is missing. I'm new with regex, may be I can use something like ‘(\d+((PQ)|(KL))+\d+){1}’ etc. to split the data into multiple columns and then pivot it.

Ideal output in my case would be to transform it into something like:

In the above solution, each row in original table is repeated from 1-16 times depending on the number of items in the Results array.

I’m not completely sure if it’s possible to do this in big query. I’ll be grateful if anyone can help me out a little here.

If this is not possible, then I can have 16 rows for every event with NULL values in Event_details for cases where there are less than 16 entries in result array.

In case both of these are not possible, the last solution would be to have it transformed into something like:

The reason I want to transform the data is that in most of the cases I would need to find which result array items are appearing and in what order.

解决方案

Check this out: Split string into multiple columns with bigquery. In their case its delimited by spaces. replace the \s with ','

something like:

SELECT  
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){0}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word0,
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){1}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word1,
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){2}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word2,
Regexp_extract(StringToParse,r'^*{(?:[^,]*,){3}(\d+(?:(?:PQ)|(?:KL))+\d+)\s?') as Word3,
FROM
(SELECT 'bla{1234PQ5,6789KL0,1234PQ5,6789KL0,123' as StringToParse)

这篇关于在Google bigquery中转换数据 - 提取文本,将其拆分为多个列并旋转数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆