在bigquery中枢转多层嵌套字段 [英] pivot multi-level nested fields in bigquery

查看:35
本文介绍了在bigquery中枢转多层嵌套字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的bq表架构:

继续发布该帖子:使用嵌套字段进行大查询旋转我正试图弄平这张桌子.我想取消嵌套timeseries.data字段,即最终的行数应等于timeseries.data数组的总长度.我还想添加带有某些值的annotation.properties.key作为附加列,并添加annotation.properties.value作为其值.因此,在这种情况下,它将是保证金"列.但是,以下查询给我错误:无法识别的名称:数据".但是在最后一个FROM之后,我已经做了:unnest(timeseries.data)作为数据.

Continuing this post: bigquery pivoting with nested field I'm trying to flatten this table. I would like to unnest the timeseries.data fields, i.e. the final number of rows should be equal to the total length of timeseries.data arrays. I would also like to add annotation.properties.key with certain value as additional columns, and annotation.properties.value as its value. So in this case, it would be the "margin" column. However the following query gives me error: "Unrecognized name: data". But after the last FROM, I did already: unnest(timeseries.data) as data.

flow_timestamp, channel_name, number_of_digits, timestamp, value, margin
2019-10-31 15:31:15.079674 UTC, channel_1, 4, 2018-02-28T02:00:00, 50, 0.01

查询:

SELECT 
  flow_timestamp, timeseries.channel_name, 

  ( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL)) 
    FROM UNNEST(timeseries.channel_properties) AS channel_properties
  ),
  data.timestamp ,data.value

,(with subq as (select * from unnest(data.annotation))
select max(if (properties.key = 'margin', properties.value, null))
from (
select * from unnest(subq.properties)
) as properties
) as margin

FROM my_table
left join unnest(timeseries.data) as data

WHERE DATE(flow_timestamp) between "2019-10-28" and "2019-11-02" 
order by flow_timestamp

推荐答案

尝试以下

#standardSQL
SELECT 
  flow_timestamp, 
  timeseries.channel_name, 
  ( SELECT MAX(IF(channel_properties.key = 'number_of_digits', channel_properties.value, NULL)) 
    FROM UNNEST(timeseries.channel_properties) AS channel_properties
  ) AS number_of_digits, 
  item.timestamp, 
  item.value, 
  ( SELECT MAX(IF(prop.key = 'margin', prop.value, NULL)) 
    FROM UNNEST(item.annotation) AS annot, UNNEST(annot.properties) prop
  ) AS margin  
FROM my_table 
LEFT JOIN UNNEST(timeseries.data) item
WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02' 
ORDER BY flow_timestamp

下面是相同解决方案的详细程度稍低的版本,但我通常更喜欢上面的版本,因为它更易于维护

Below is a little less verbose version of the same solution, but I usually prefer above as it simpler to maintain

#standardSQL
SELECT 
  flow_timestamp, 
  timeseries.channel_name, 
  ( SELECT MAX(IF(key = 'number_of_digits', value, NULL)) 
    FROM UNNEST(timeseries.channel_properties) AS channel_properties
  ) AS number_of_digits, 
  timestamp, 
  value, 
  ( SELECT MAX(IF(key = 'margin', value, NULL)) 
    FROM UNNEST(annotation), UNNEST(properties) 
  ) AS margin  
FROM my_table 
LEFT JOIN UNNEST(timeseries.data)   
WHERE DATE(flow_timestamp) BETWEEN '2019-10-28' AND '2019-11-02' 
ORDER BY flow_timestamp

这篇关于在bigquery中枢转多层嵌套字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆