展平Firebase导出到BigQuery到表中,其中1行= 1个事件(嵌套数据内的嵌套数据) [英] Flatten Firebase exports to BigQuery into tables where 1 row = 1 event (nested data within nested data)

查看:45
本文介绍了展平Firebase导出到BigQuery到表中,其中1行= 1个事件(嵌套数据内的嵌套数据)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我认为我可以通过问一个更简单的问题(引用一个更简单的数据示例)来获得所需的内容

I thought I'd be able to get what I needed by asking a simpler question referencing a simpler data example here, but I still need some help.

我对于在BigQuery中查询json样式数据非常陌生,并且在Firebase为我转储到BigQuery中的分析(事件)数据遇到麻烦.1行数据的格式如下(删去了一些绒毛).

I'm pretty new to querying json style data within BigQuery, and am having trouble with the analytics (events) data that Firebase dumps into BigQuery for me. The format of 1 row of data is below (trimmed out some fluff).

{
  "user_dim": {
    "user_id": "some_identifier_here",
    "user_properties": [
      {
        "key": "special_key1",
        "val": {
          "val": {
            "str_val": "894",
            "int_val": null
          }
        }
      },
      {
        "key": "special_key2",
        "val": {
          "val": {
            "str_val": "1",
            "int_val": null
          }
        }
      },
      {
        "key": "special_key3",
        "val": {
          "val": {
            "str_val": "23",
            "int_val": null
          }
        }
      }
    ],
    "device_info": {
      "device_category": "mobile",
      "mobile_brand_name": "Samsung",
      "mobile_model_name": "model_phone"
    },
    "dt_a": "1470625311138000",
    "dt_b": "1470620345566000"
  },
  "event_dim": [
    {
      "name": "user_engagement",
      "params": [
        {
          "key": "firebase_event_origin",
          "value": {
            "string_value": "auto",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "engagement_time_msec",
          "value": {
            "string_value": null,
            "int_value": "30006",
            "float_value": null,
            "double_value": null
          }
        }
      ],
      "timestamp_micros": "1470675614434000",
      "previous_timestamp_micros": "1470675551092000"
    },
    {
      "name": "new_game",
      "params": [
        {
          "key": "total_time",
          "value": {
            "string_value": "496048",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "armor",
          "value": {
            "string_value": "2",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "reason",
          "value": {
            "string_value": "power_up",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        }
      ],
      "timestamp_micros": "1470675825988001",
      "previous_timestamp_micros": "1470675282500001"
    },
    {
      "name": "user_engagement",
      "params": [
        {
          "key": "firebase_event_origin",
          "value": {
            "string_value": "auto",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "engagement_time_msec",
          "value": {
            "string_value": null,
            "int_value": "318030",
            "float_value": null,
            "double_value": null
          }
        }
      ],
      "timestamp_micros": "1470675972778002",
      "previous_timestamp_micros": "1470675614434002"
    },
    {
      "name": "won_game",
      "params": [
        {
          "key": "total_time",
          "value": {
            "string_value": "497857",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "level",
          "value": {
            "string_value": null,
            "int_value": "207",
            "float_value": null,
            "double_value": null
          }
        },
        {
          "key": "sword",
          "value": {
            "string_value": "iron",
            "int_value": null,
            "float_value": null,
            "double_value": null
          }
        }
      ],
      "timestamp_micros": "1470677171374007",
      "previous_timestamp_micros": "1470671343784007"
    }
  ]
}

基于对原始问题的回答,我已经能够很好地处理对象 user_dim 的第一部分.但是,每当我尝试对 event_dim 字段使用类似方法(取消嵌套)时,查询都会失败,并显示消息错误:标量子查询产生了多个元素".我怀疑这是由于 event_dim 本身就是一个数组,并且其中也包含具有数组的结构.

Based on the answers to my original question I've been able to work just fine with the first part of the object user_dim. However, whenever I try similar approaches to the event_dim field (unnesting it) the queries fail with the message "Error: Scalar subquery produced more than one element." I have a suspicion this is due to the fact that event_dim is an array itself, and contains structs that have arrays in them as well.

如果这有帮助,那是给我错误的基本查询,尽管应该注意的是,我在BQ中处理这种类型的数据已经超出了我的本领,并且可能会完全偏离正常轨道:

If it helps here is the basic query that is giving me the error, although it should be noted that I am quite out of my element working with this type of data in BQ and could be going completely off course:

SELECT
  (SELECT name FROM UNNEST(event_dim) WHERE name = 'user_engagement') AS event_name
FROM
  my_table;

我要获得的最终结果是一个查询,该查询可以将包含许多此类对象的表转换为一个表,该表在每个对象中为每个事件输出1行 event_dim 数组.即,对于上面的示例对象,我希望它输出4行,其中第一组列是相同的,并且只是来自 user_dim 的元数据.然后,我希望可以根据我知道的每种可能存在的事件(例如 event_name,firebase_event_origin,accounting_time_msec,total_time,armor,reason,level,sword )的存在情况明确定义的列,然后填充该事件参数的值;如果不存在,则为NULL.

The end result I am going for is a query that can turn a table that contains many of these types of objects into a table that outputs 1 row per event in each objects event_dim array. i.e. for the example object above, I'd want it to output 4 rows where the first set of columns are identical and are just the metadata from user_dim. Then I'd like columns that I can explicitly define based on what I know will exist for each possible event, like event_name, firebase_event_origin, engagement_time_msec, total_time, armor, reason, level, sword and then fill with the value from that event parameter, or NULL if it doesn't exist.

推荐答案

希望,下面可以为您提供下一次推送

Hope, below can give you next push

WITH YourTable AS (
  SELECT ARRAY[
    STRUCT(
      "user_engagement" AS name,
      ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
        STRUCT("firebase_event_origin", STRUCT("auto", NULL)),
        STRUCT("engagement_time_msec", STRUCT("30006", NULL))] AS params,
      1470675614434000 AS TIMESTAMP_MICROS,
      1470675551092000 AS previous_timestamp_micros
    ),
    STRUCT(
      "new_game" AS name,
      ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
        STRUCT("total_time", STRUCT("496048", NULL)),
        STRUCT("armor", STRUCT("2", NULL)),
        STRUCT("reason", STRUCT("power_up", NULL))] AS params,
      1470675825988001 AS TIMESTAMP_MICROS,
      1470675282500001 AS previous_timestamp_micros
    ),
    STRUCT(
      "user_engagement" AS name,
      ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
        STRUCT("firebase_event_origin", STRUCT("auto", NULL)),
        STRUCT("engagement_time_msec", STRUCT("318030", NULL))] AS params,
      1470675972778002 AS TIMESTAMP_MICROS,
      1470675614434002 AS previous_timestamp_micros
    ),
    STRUCT(
      "won_game" AS name,
      ARRAY<STRUCT<key STRING, val STRUCT<str_val STRING, int_val INT64>>>[
        STRUCT("total_time", STRUCT("497857", NULL)),
        STRUCT("level", STRUCT("207", NULL)),
        STRUCT("sword", STRUCT("iron", NULL))] AS params,
      1470677171374007 AS TIMESTAMP_MICROS,
      1470671343784007 AS previous_timestamp_micros
    )
  ] AS event_dim
)
SELECT 
  name, 
  (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "firebase_event_origin") AS firebase_event_origin,
  (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "engagement_time_msec") AS engagement_time_msec,
  (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "total_time") AS total_time,
  (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "armor") AS armor,
  (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "reason") AS reason,
  (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "level") AS level,
  (SELECT val.str_val FROM UNNEST(dim.params) WHERE key = "sword") AS sword
FROM YourTable, UNNEST(event_dim) AS dim

这篇关于展平Firebase导出到BigQuery到表中,其中1行= 1个事件(嵌套数据内的嵌套数据)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆