大型查询主键和聚合重复字段 [英] Big Query pivot and aggregate repeated fields

查看:198
本文介绍了大型查询主键和聚合重复字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想旋转字段'unitId','firebase_screen_class',以便每个字段出现在单独的列中:

  SELECT 
event.name,
event_param.value.string_value AS ad_unit,
COUNT(*)AS event_count
FROM
`app_events_20170510`,
UNNEST (event_dim)AS事件
UNNEST(event.params)as event_param
WHERE
event.name in('Ad_requested','Ad_clicked','Ad_shown')
和event_param .key in('unitId','screen_class')
GROUP BY 1,2

我使用了以下使用传统SQL的查询,但没有显示正确的聚合结果:

  SELECT event_name,ad_unit,count( *)FROM 

SELECT
event_dim.name as event_name,
MAX(IF(event_dim.params.key =firebase_screen_class,event_dim.params.value.string_value,NULL ))WITHIN RECORD as firebase_screen_class,
MAX(IF(event_dim.params.key =unit Id',event_dim.params.value.string_value,NULL))WITHIN RECORD as ad_unit
FROM
[app_events_20170510]
WHERE
event_dim.name in('Ad_requested','Ad_shown ','Ad_clicked')
和event_dim.params.key('unitId','screen_class')

group by 1,2



我正在寻找以下输出:

  _________________________________________________________________________ 
| event_dim.name | unitId | screen_class | count_events |
| __________________ | ________________ | ______________________ | _____________ |
| Ad_requested | hpg | socialFeed | 520 |
| __________________ | ________________ | ______________________ | _____________ |
| Ad_shown | hpg |聊天| 950 |
| __________________ | ________________ | ______________________ | _____________ |
| Ad_requested | hni |聊天| 740 |
| __________________ | ________________ | ______________________ | _____________ |

所有事件 Ad_requested Ad_shown Ad_clicked 具有相同键( unitId screen_class )以及每个键的相同值( unitId hpg hni / screen_class socialFeed chat

解决方案

以下是针对BigQuery标准SQL



$ b

#standardSQL
WITH`aggregation` AS(
SELECT
event.name,
event_param.key,
COUNT(*)AS event_count
FROM
`app_events_20170510`,
UNNEST(event_dim)AS事件,
UNNEST(event.params)AS event_param
WHERE
event.name IN('Ad_requested','Ad_clicked','Ad_shown')
AND event_param.key IN('unitId', 'firebase_screen_class','house')
GROUP BY 1,2

SELECT
name,
MAX(IF(key ='unitId',event_count,NULL))AS unitId,
MAX(IF ='firebase_screen_class',event_count,NULL))AS firebase_screen_class,
MAX(IF)(key ='house',event_count,NULL))AS house
FROM`aggregation`
GROUP BY name




根据评论中的说明进行更新:



$ b

  #standardSQL 
SELECT
event.name,
(SELECT value.string_value FROM UNNEST(event.params)WHERE key ='unitId')AS unitId,
(SELECT value.string_value FROM UNNEST(event.params)WHERE key ='firebase_screen_class')AS firebase_screen_class,
(SELECT value.string_value FROM UNNEST(event.params)WHERE key ='house')AS house,
COUNT(1)AS event_count
FROM`app_events_20170510`,UNNEST(event_dim)AS事件
WHERE event.name IN('Ad_requested','Ad_click ed','Ad_shown')
GROUP BY 1,2,3,4




... 出于好奇,我尝试使用传统SQL复制查询 ...



新增版本对于BigQuery Legacy SQL(纯粹是为了学习的目的,希望帮助那些考虑将标准SQL迁移到标准SQL作为两个版本的相同任务的人现在在这里提供)




#legacySQL
SELECT name,product_id,source,firebase_event_origin,COUNT(1)as event_count
FROM(
SELECT event_dim.name AS name,
MAX(IF(event_dim.params.key ='unitId',event_dim.params.value.string_value,NULL))WITHIN RECORD AS unitId,
MAX(IF(event_dim.params.key ='firebase_screen_class',event_dim.params.value.string_value,NULL))WITHIN RECORD as firebase_screen_class,
MAX(IF(event_dim.params.key ='house ',event_dim.params.value.string_value,NU LL))WITHIN RECORD AS house,
FROM FLATTEN([project:dataset.app_events_20170510],event_dim)AS事件
WHERE event_dim.name IN('Ad_requested','Ad_clicked','Ad_shown')

GROUP BY 1,2,3,4


I would like to pivot the fields 'unitId', 'firebase_screen_class' so that each one appears in a separate column:

SELECT
  event.name,
  event_param.value.string_value AS ad_unit,
  COUNT(*) AS event_count
FROM
  `app_events_20170510`, 
  UNNEST(event_dim) AS event, 
  UNNEST(event.params) as event_param
WHERE
  event.name in ('Ad_requested', 'Ad_clicked', 'Ad_shown')
  and event_param.key in ('unitId', 'screen_class')
GROUP BY 1,2

I have used the following query using legacy SQL but it is not showing the correct aggregation results:

SELECT event_name, ad_unit, count(*) FROM
(
SELECT
  event_dim.name as event_name,
  MAX(IF(event_dim.params.key = "firebase_screen_class", event_dim.params.value.string_value, NULL)) WITHIN RECORD as firebase_screen_class,
  MAX(IF(event_dim.params.key = "unitId", event_dim.params.value.string_value, NULL)) WITHIN RECORD as ad_unit
FROM
  [app_events_20170510]
WHERE
  event_dim.name in ('Ad_requested','Ad_shown', 'Ad_clicked')
  and event_dim.params.key in ('unitId','screen_class')
)
group by 1,2

I am looking for the following output:

_________________________________________________________________________
| event_dim.name   | unitId         | screen_class         | count_events|
|__________________|________________|______________________|_____________|
| Ad_requested     | hpg            | socialFeed           |    520      |
|__________________|________________|______________________|_____________|
| Ad_shown         | hpg            | chat                 |    950      |
|__________________|________________|______________________|_____________|
| Ad_requested     | hni            | chat                 |    740      |
|__________________|________________|______________________|_____________|

All events Ad_requested, Ad_shown and Ad_clicked have parameters with the same keys(unitId, screen_class) and also the same values for each key (unitId: hpg, hni / screen_class: socialFeed, chat)

解决方案

Below is for BigQuery Standard SQL

#standardSQL
WITH `aggregation` AS (
  SELECT
    event.name,
    event_param.key,
    COUNT(*) AS event_count
  FROM
    `app_events_20170510`, 
    UNNEST(event_dim) AS event, 
    UNNEST(event.params) AS event_param
  WHERE
    event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
    AND event_param.key IN ('unitId', 'firebase_screen_class','house')
  GROUP BY 1, 2
)
SELECT 
  name,
  MAX(IF(key = 'unitId', event_count, NULL)) AS unitId,
  MAX(IF(key = 'firebase_screen_class', event_count, NULL)) AS firebase_screen_class,
  MAX(IF(key = 'house', event_count, NULL)) AS house
FROM `aggregation`
GROUP BY name  

Update based on clarifications in comments:

#standardSQL
SELECT
  event.name,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'unitId') AS unitId,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'firebase_screen_class') AS firebase_screen_class,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'house') AS house,
  COUNT(1) AS event_count
FROM `app_events_20170510`, UNNEST(event_dim) AS event
WHERE event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
GROUP BY 1,2,3,4

... Out of curiosity, I tried to replicate the query using legacy SQL ... -

Added version for BigQuery Legacy SQL (purely for learning purposes and in hope to help those who consider migrating to Standard SQL as two versions of same task is now presented here)

#legacySQL
SELECT name, product_id, source, firebase_event_origin, COUNT(1) AS event_count
FROM (
  SELECT event_dim.name AS name,
    MAX(IF(event_dim.params.key = 'unitId', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS unitId,
    MAX(IF(event_dim.params.key = 'firebase_screen_class', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS firebase_screen_class,
    MAX(IF(event_dim.params.key = 'house', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS house,
  FROM FLATTEN([project:dataset.app_events_20170510], event_dim) AS event
  WHERE event_dim.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
)
GROUP BY 1, 2, 3, 4

这篇关于大型查询主键和聚合重复字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆