在Google BigQuery的不同栏中查询关键字值 [英] Query key value in different columns from Google BigQuery

查看:757
本文介绍了在Google BigQuery的不同栏中查询关键字值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用与Google BigQuery关联的Firebase Analytics收集分析数据。

我在BigQuery中获得以下数据(不必要的列/行被忽略,数据集看起来像类似于 https:// bigquery。 cloud.google.com/table/firebase-analytics-sample-data:ios_dataset.app_events_20160607?tab=preview ):

  | event_dim.name | event_dim.params.key | event_dim.params.value.string_value | 
| ---------------- | ---------------------- | ----- -------------------------------- |
| read_post | post_id | p_100 |
| | group_id | g_1 |
| | user_id | u_1 |
| open_group | post_id | p_200 |
| | group_id | g_2 |
| | user_id | u_1 |
| open_group | post_id | p_300 |
| | group_id | g_1 |
| | user_id | u_3 |

我想查询以下数据:


  • 活动名称

  • 用户名称

  • 组ID



我试过了以下查询:

pre $ SELECT
event_dim.name,
FIRST(IF(event_dim.params.key =user_id,event_dim.params.value.string_value,NULL))WITHIN RECORD USER_ID,
FIRST(IF(event_dim.params.key =group_id, event_dim.params.value.string_value,NULL))WITHIN RECORD group_id
FROM
[xxx:xxx_IOS.app_events_20161102]
LIMIT
1000

上述查询的问题是聚集函数 FIRST 会给出错误的结果,因为使用 WITHIN 修饰符的 SELECT 语句将返回结果列表。 FIRST 函数只会在第一行的情况下给出正确的结果。

解决方案

使用标准SQL (取消选中使用旧版SQL你可以这样做:

  SELECT 
event_dim.name,
(SELECT值.string_value FROM UNNEST(params)
WHERE key ='user_id')AS user_id,
(SELECT value.string_value FROM UNNEST(params)
WHERE key ='group_id')AS group_id
FROM`firebase-analytics-sample-data.ios_dataset.app_events_20160607`,
UNNEST(event_dim)AS event_dim
LIMIT 1000;

如果您只希望同时具有'user_id''group_id',您可以过滤掉NULL值:

  SELECT 
event_dim.name,
(SELECT value.string_value FROM UNNEST(params)
WHERE key ='user_id')AS user_id,
(SELECT value.string_value FROM UNNEST(params)
WHERE key ='group_id')AS group_id
FROM`firebase-analytics-sample-data.ios_dataset.app_events_20160607`,
UNNEST( event_dim)AS event_dim

WHERE user_id IS NOT NULL AND GROUP_ID IS NOT NULL
LIMIT 1000;


I gather analytics with Firebase Analytics which I linked to Google BigQuery.

I have the following data in BigQuery (unnecessary columns/rows are left off, the dataset looks similar to https://bigquery.cloud.google.com/table/firebase-analytics-sample-data:ios_dataset.app_events_20160607?tab=preview):

| event_dim.name | event_dim.params.key | event_dim.params.value.string_value |
|----------------|----------------------|-------------------------------------|
| read_post      | post_id              | p_100                               |
|                | group_id             | g_1                                 |
|                | user_id              | u_1                                 |
| open_group     | post_id              | p_200                               |
|                | group_id             | g_2                                 |
|                | user_id              | u_1                                 |
| open_group     | post_id              | p_300                               |
|                | group_id             | g_1                                 |
|                | user_id              | u_3                                 |

I want to query the following data:

  • event name
  • user id
  • group id

I tried the following query:

SELECT
  event_dim.name,
  FIRST(IF(event_dim.params.key = "user_id", event_dim.params.value.string_value, NULL)) WITHIN RECORD user_id,
  FIRST(IF(event_dim.params.key = "group_id", event_dim.params.value.string_value, NULL)) WITHIN RECORD group_id
FROM
  [xxx:xxx_IOS.app_events_20161102]
LIMIT
  1000

The problem with the above query is that the aggregate function FIRST will give the wrong result because the SELECT statements with a WITHIN modifier will return a list of results. The FIRST function will only give the correct result in case of the first row.

解决方案

Using standard SQL (uncheck "Use Legacy SQL" under "Show Options") you can do:

SELECT
  event_dim.name,
  (SELECT value.string_value FROM UNNEST(params)
   WHERE key = 'user_id') AS user_id,
  (SELECT value.string_value FROM UNNEST(params)
   WHERE key = 'group_id') AS group_id
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160607`,
  UNNEST(event_dim) AS event_dim
LIMIT 1000;

If you only want rows that have both 'user_id' and 'group_id', you can filter out the NULL values:

SELECT * FROM (
  SELECT
    event_dim.name,
    (SELECT value.string_value FROM UNNEST(params)
     WHERE key = 'user_id') AS user_id,
    (SELECT value.string_value FROM UNNEST(params)
     WHERE key = 'group_id') AS group_id
  FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160607`,
    UNNEST(event_dim) AS event_dim
)
WHERE user_id IS NOT NULL AND group_id IS NOT NULL
LIMIT 1000;

这篇关于在Google BigQuery的不同栏中查询关键字值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆