Big Query 透视和聚合重复字段 [英] Big Query pivot and aggregate repeated fields

查看:17
本文介绍了Big Query 透视和聚合重复字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想旋转字段unitId"、firebase_screen_class",以便每个字段都出现在单独的列中:

SELECT事件名称,event_param.value.string_value AS ad_unit,COUNT(*) AS event_count从`app_events_20170510`,UNNEST(event_dim) AS 事件,UNNEST(event.params) 作为 event_param在哪里event.name in ('Ad_requested', 'Ad_clicked', 'Ad_shown')和 event_param.key ('unitId', 'screen_class')按 1,2 分组

我使用旧版 SQL 使用了以下查询,但它没有显示正确的聚合结果:

SELECT event_name, ad_unit, count(*) FROM(选择event_dim.name 作为 event_name,MAX(IF(event_dim.params.key = "firebase_screen_class", event_dim.params.value.string_value, NULL)) 记录为 firebase_screen_class,MAX(IF(event_dim.params.key = "unitId", event_dim.params.value.string_value, NULL)) 记录为 ad_unit从[app_events_20170510]在哪里event_dim.name in ('Ad_requested','Ad_shown', 'Ad_clicked')和 event_dim.params.key ('unitId','screen_class'))按 1,2 分组

我正在寻找以下输出:

_______________________________________________________________________|event_dim.name |单位 ID |screen_class |count_events||__________________|________________|______________________|_____________||Ad_requested |hpg |社会饲料 |520 ||__________________|________________|______________________|_____________||Ad_shown |hpg |聊天 |950 ||__________________|________________|______________________|_____________||Ad_requested |海尼 |聊天 |第740话|__________________|________________|______________________|_____________|

所有事件Ad_requestedAd_shownAd_clicked 的参数都具有相同的键(unitId, screen_class) 以及每个键的相同值 (unitId: hpg, hni/screen_class: socialFeed, chat)

解决方案

以下为 BigQuery Standard SQL

#standardSQLWITH`聚合`AS(选择事件名称,event_param.key,COUNT(*) AS event_count从`app_events_20170510`,UNNEST(event_dim) AS 事件,UNNEST(event.params) AS event_param在哪里event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')AND event_param.key IN ('unitId', 'firebase_screen_class','house')按 1、2 分组)选择名称,MAX(IF(key = 'unitId', event_count, NULL)) AS unitId,MAX(IF(key = 'firebase_screen_class', event_count, NULL)) AS firebase_screen_class,MAX(IF(key = 'house', event_count, NULL)) 作为房子从`聚合`按名称分组

<块引用>

根据评论中的说明进行更新:

#standardSQL选择事件名称,(SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'unitId') AS unitId,(SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'firebase_screen_class') AS firebase_screen_class,(SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'house') AS house,COUNT(1) AS event_countFROM `app_events_20170510`, UNNEST(event_dim) AS 事件WHERE event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')按 1,2,3,4 分组

<块引用>

... 出于好奇,我尝试使用旧 SQL 复制查询 ... -

为 BigQuery Legacy SQL 添加了版本(纯粹出于学习目的,希望能帮助那些考虑迁移到标准 SQL 的人,因为这里提供了相同任务的两个版本)

#legacySQLSELECT 名称、product_id、来源、firebase_event_origin、COUNT(1) AS event_count从 (SELECT event_dim.name AS 名称,MAX(IF(event_dim.params.key = 'unitId', event_dim.params.value.string_value, NULL)) 记录为 unitId,MAX(IF(event_dim.params.key = 'firebase_screen_class', event_dim.params.value.string_value, NULL)) 记录为 firebase_screen_class,MAX(IF(event_dim.params.key = 'house', event_dim.params.value.string_value, NULL)) 记录为房子,FROM FLATTEN([project:dataset.app_events_20170510], event_dim) AS 事件WHERE event_dim.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown'))按 1、2、3、4 分组

I would like to pivot the fields 'unitId', 'firebase_screen_class' so that each one appears in a separate column:

SELECT
  event.name,
  event_param.value.string_value AS ad_unit,
  COUNT(*) AS event_count
FROM
  `app_events_20170510`, 
  UNNEST(event_dim) AS event, 
  UNNEST(event.params) as event_param
WHERE
  event.name in ('Ad_requested', 'Ad_clicked', 'Ad_shown')
  and event_param.key in ('unitId', 'screen_class')
GROUP BY 1,2

I have used the following query using legacy SQL but it is not showing the correct aggregation results:

SELECT event_name, ad_unit, count(*) FROM
(
SELECT
  event_dim.name as event_name,
  MAX(IF(event_dim.params.key = "firebase_screen_class", event_dim.params.value.string_value, NULL)) WITHIN RECORD as firebase_screen_class,
  MAX(IF(event_dim.params.key = "unitId", event_dim.params.value.string_value, NULL)) WITHIN RECORD as ad_unit
FROM
  [app_events_20170510]
WHERE
  event_dim.name in ('Ad_requested','Ad_shown', 'Ad_clicked')
  and event_dim.params.key in ('unitId','screen_class')
)
group by 1,2

I am looking for the following output:

_________________________________________________________________________
| event_dim.name   | unitId         | screen_class         | count_events|
|__________________|________________|______________________|_____________|
| Ad_requested     | hpg            | socialFeed           |    520      |
|__________________|________________|______________________|_____________|
| Ad_shown         | hpg            | chat                 |    950      |
|__________________|________________|______________________|_____________|
| Ad_requested     | hni            | chat                 |    740      |
|__________________|________________|______________________|_____________|

All events Ad_requested, Ad_shown and Ad_clicked have parameters with the same keys(unitId, screen_class) and also the same values for each key (unitId: hpg, hni / screen_class: socialFeed, chat)

解决方案

Below is for BigQuery Standard SQL

#standardSQL
WITH `aggregation` AS (
  SELECT
    event.name,
    event_param.key,
    COUNT(*) AS event_count
  FROM
    `app_events_20170510`, 
    UNNEST(event_dim) AS event, 
    UNNEST(event.params) AS event_param
  WHERE
    event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
    AND event_param.key IN ('unitId', 'firebase_screen_class','house')
  GROUP BY 1, 2
)
SELECT 
  name,
  MAX(IF(key = 'unitId', event_count, NULL)) AS unitId,
  MAX(IF(key = 'firebase_screen_class', event_count, NULL)) AS firebase_screen_class,
  MAX(IF(key = 'house', event_count, NULL)) AS house
FROM `aggregation`
GROUP BY name  

Update based on clarifications in comments:

#standardSQL
SELECT
  event.name,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'unitId') AS unitId,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'firebase_screen_class') AS firebase_screen_class,
  (SELECT value.string_value FROM UNNEST(event.params) WHERE key = 'house') AS house,
  COUNT(1) AS event_count
FROM `app_events_20170510`, UNNEST(event_dim) AS event
WHERE event.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
GROUP BY 1,2,3,4

... Out of curiosity, I tried to replicate the query using legacy SQL ... -

Added version for BigQuery Legacy SQL (purely for learning purposes and in hope to help those who consider migrating to Standard SQL as two versions of same task is now presented here)

#legacySQL
SELECT name, product_id, source, firebase_event_origin, COUNT(1) AS event_count
FROM (
  SELECT event_dim.name AS name,
    MAX(IF(event_dim.params.key = 'unitId', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS unitId,
    MAX(IF(event_dim.params.key = 'firebase_screen_class', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS firebase_screen_class,
    MAX(IF(event_dim.params.key = 'house', event_dim.params.value.string_value, NULL)) WITHIN RECORD AS house,
  FROM FLATTEN([project:dataset.app_events_20170510], event_dim) AS event
  WHERE event_dim.name IN ('Ad_requested', 'Ad_clicked', 'Ad_shown')
)
GROUP BY 1, 2, 3, 4

这篇关于Big Query 透视和聚合重复字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆