如何用Firebase Analytics替换旧架构中的event_dim在新架构中编写Bigquery? [英] how to write Bigquery in new schema with replacing event_dim in old schema from Firebase analytics?

查看:79
本文介绍了如何用Firebase Analytics替换旧架构中的event_dim在新架构中编写Bigquery?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

旧的BigQuery Export模式明智脚本正在运行.如下所示.但是我想复制此代码,并根据我们的Bigquery模式已更改,根据新的导出模式编写它.请在新的BigQuery导出模式中帮忙,因为我没有找到与之相对应的其他记录 event_dim (event_dim位于旧的BigQuery Export模式中).

The old BigQuery Export schema wise script is running.It is given below. But I want to replicate this code and write it according to new export schema as we Bigquery schema has been changed. Please help becasue in new BigQuery Export schema I don't find any other corresponding record against event_dim (event_dim is in according to old BigQuery Export schema).

以下是BigQuery导出架构的链接: 单击此处

 SELECT user_dim.app_info.app_instance_id
          , (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
          , (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
                event.name,
                params.value.int_value engagement_time
        FROM `xxx.app_events_*`,
        UNNEST(event_dim) as event,
        UNNEST(event.params) as params,
        UNNEST(user_dim.user_properties) as user_params
        where (event.name = "user_engagement" and params.key = "engagement_time_msec")
        and
                (user_params.key = "access" and user_params.value.value.string_value = "true") and
                PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
                PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"

在下面的查询中进行了尝试,但是我希望在一条SELECT语句中使用app_instance,min_time,max_time,event_name,engagement_time.当我使用分组依据"时,我无法一次获得所有这些信息(app_instance,min_time,max_time,event_name,engagement_time).请帮忙.

Tried the query below but what I want app_instance, min_time, max_time, event_name, engagement_time at one SELECT statement. And as I am using 'group by', I am not able to get all those (app_instance, min_time, max_time, event_name, engagement_time) at a time. Please help.

 SELECT user_pseudo_id
     , MIN(event_timestamp) AS min_time
      ,MAX(event_timestamp) AS max_time
    FROM `xxx.app_events_*` as T,
       T.event_params,
       T.user_properties,
       T.event_timestamp
    where (event_name = "user_engagement" and event_params.key = "engagement_time_msec")
    and
            (user_properties.key = "access" and user_properties.value.string_value = "true") and
            PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
            PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
    group by 1

推荐答案

将现有BQ数据集从旧模式迁移到新模式提供了有关这些字段如何更改的一些提示.

It is true that there was a schema change in the Google Analytics for Firebase BigQuery Export. Although there is no clear mapping of the old fields as compared to the new ones, the SQL query that is provided in the documentation in order to migrate existing BQ datasets from the old schema to the new one provides some hints of how have these fields changed.

我分享下面的migration_script.sql SQL查询,仅供参考,但让我根据您的用例指出最相关的更改:

I share the migration_script.sql SQL query below, just for reference, but let me pin-point the most relevant changes for your use-case:

  • event_dim 在SQL查询中被映射为 event ,但在架构中没有任何最终表示形式,因为 event_dim 不再是嵌套字段:UNNEST(event_dim) AS event
  • event_dim.timestamp_micros 映射为 event_timestamp :event.timestamp_micros AS event_timestamp
  • event_dim.name 被映射为 event_name :event.name AS event_name
  • event_param.value.int_value 映射为 event_params.value.int_value :event_param.value.int_value AS int_value
  • user_dim.user_properties 映射为 user_properties ,并且其所有嵌套值都遵循相同的结构:UNNEST(user_dim.user_properties) AS user_property) AS user_properties
  • event_dim is mapped as event in the SQL query, but does not have any final representation in the schema, because event_dim is no longer a nested field: UNNEST(event_dim) AS event
  • event_dim.timestamp_micros is mapped as event_timestamp: event.timestamp_micros AS event_timestamp
  • event_dim.name is mapped as event_name: event.name AS event_name
  • event_param.value.int_value is mapped as event_params.value.int_value: event_param.value.int_value AS int_value
  • user_dim.user_properties is mapped as user_properties, and all its nested values follow the same structure: UNNEST(user_dim.user_properties) AS user_property) AS user_properties

因此,总而言之,为了简单起见,模式更改已集中在取消嵌套多个字段上,例如,而不是必须访问event_dim.name(这需要取消嵌套并使查询复杂化). ,您可以直接查询字段event_name.

So, in summary, the schema change has been focused at unnesting several of the fields for simplicity, in such a way that, for example, instead of having to access event_dim.name (which would require unnesting and complicating the query), you can query directly the field event_name.

记住这一点,我确信您将能够使查询适应这种新模式,并且由于您不必嵌套太多字段,因此它看起来可能会更加简单.

Having this in mind, I am sure you will be able to adapt your query to this new schema, and it will probably look way more simple, given that you will not have to unnest so many fields.

为澄清起见,让我与您分享一些比较旧模式和新模式的示例BQ查询(它们使用公共Firebase表,因此您应该可以即开即用地运行它们):

Just for clarification, let me share with you a couple of sample BQ queries comparing the old and the new schema (they are using public Firebase tables, so you should be able to run them out-of-the-box):

# Old Schema - UNNEST() required because there are nested fields
SELECT
  user_dim.app_info.app_instance_id,
  MIN(event.timestamp_micros) AS min_time,
  MAX(event.timestamp_micros) AS max_time,
  event.name
FROM
  `firebase-public-project.com_firebase_demo_ANDROID.app_events_20180503`,
  UNNEST(event_dim) AS event
WHERE
  event.name = "user_engagement"
GROUP BY
  user_dim.app_info.app_instance_id,
  event.name

相比:

# New Schema - UNNEST() not required because there are no nested fields
SELECT
  user_pseudo_id,
  MIN(event_timestamp) AS min_time,
  MAX(event_timestamp) AS max_time,
  event_name
FROM
  `firebase-public-project.analytics_153293282.events_20180815`
WHERE
  event_name = "user_engagement"
GROUP BY
  user_pseudo_id,
  event_name

这些查询是等效的,但是使用新旧架构引用表.请注意,由于查询更加复杂,您可能需要添加一些UNNEST()才能访问表中其余的嵌套字段.

These queries are equivalent, but referencing tables with the old and new schema. Please note that, as your query is more complex, you may need to add some UNNEST() in order to access the remaining nested fields in the table.

此外,您可能想看看这些样本,可以为您提供一些有关如何使用新架构编写查询的想法.

Additionally, you may want to have a look at these samples that can help you with some ideas on how to write queries with the new schema.

编辑2

EDIT 2

我的理解是,类似于以下查询的查询应允许您在单个语句中查询所有字段.我将所有未汇总/过滤的字段分组,但是根据您的用例(这肯定是您需要自己工作的东西),您可能需要应用其他策略才能查询非组字段(即使用MIN/MAX过滤器等).

My understanding is that a query like the one below should allow you to query for all the fields in a single statement. I am grouping by all the non-aggregated/filtered fields, but depending on your use case (this is definitely something you would need to work on your own) you may want to apply a different strategy in order to be able to query the non-grouped fields (i.e. use a MIN/MAX filter, etc.).

SELECT
  user_pseudo_id,
  MIN(event_timestamp) AS min_time,
  MAX(event_timestamp) AS max_time,
  event_name,
  par.value.int_value AS engagement_time
FROM
  `firebase-public-project.analytics_153293282.events_20180815`,
  UNNEST(event_params) as par
WHERE
  event_name = "user_engagement" AND par.key = "engagement_time_msec"
GROUP BY
  user_pseudo_id,
  event_name,
  par.value.int_value


附件


ANNEX

migration_script.sql:

migration_script.sql:

  SELECT
  @date AS event_date,
  event.timestamp_micros AS event_timestamp,
  event.previous_timestamp_micros AS event_previous_timestamp,
  event.name AS event_name,
  event.value_in_usd  AS event_value_in_usd,
   user_dim.bundle_info.bundle_sequence_id AS event_bundle_sequence_id,
  user_dim.bundle_info.server_timestamp_offset_micros as event_server_timestamp_offset,
  (
  SELECT
    ARRAY_AGG(STRUCT(event_param.key AS key,
        STRUCT(event_param.value.string_value AS string_value,
          event_param.value.int_value AS int_value,
          event_param.value.double_value AS double_value, 
          event_param.value.float_value AS float_value) AS value))
  FROM
    UNNEST(event.params) AS event_param) AS event_params,
  user_dim.first_open_timestamp_micros AS user_first_touch_timestamp,
  user_dim.user_id AS user_id,
  user_dim.app_info.app_instance_id AS user_pseudo_id,
  "" AS stream_id,
  user_dim.app_info.app_platform AS platform,
  STRUCT( user_dim.ltv_info.revenue AS revenue,
    user_dim.ltv_info.currency AS currency ) AS user_ltv,
  STRUCT( user_dim.traffic_source.user_acquired_campaign AS name,
      user_dim.traffic_source.user_acquired_medium AS medium,
      user_dim.traffic_source.user_acquired_source AS source ) AS traffic_source,
  STRUCT( user_dim.geo_info.continent AS continent,
    user_dim.geo_info.country AS country,
    user_dim.geo_info.region AS region,
    user_dim.geo_info.city AS city ) AS geo,
  STRUCT( user_dim.device_info.device_category AS category,
    user_dim.device_info.mobile_brand_name,
    user_dim.device_info.mobile_model_name,
    user_dim.device_info.mobile_marketing_name,
    user_dim.device_info.device_model AS mobile_os_hardware_model,
    @platform AS operating_system,
    user_dim.device_info.platform_version AS operating_system_version,
    user_dim.device_info.device_id AS vendor_id,
    user_dim.device_info.resettable_device_id AS advertising_id,
    user_dim.device_info.user_default_language AS language,
    user_dim.device_info.device_time_zone_offset_seconds AS time_zone_offset_seconds,
    IF(user_dim.device_info.limited_ad_tracking, "Yes", "No") AS is_limited_ad_tracking ) AS device,
  STRUCT( user_dim.app_info.app_id AS id,
    @firebase_app_id  AS firebase_app_id,
    user_dim.app_info.app_version AS version,
    user_dim.app_info.app_store AS install_source ) AS app_info,
  (
  SELECT
    ARRAY_AGG(STRUCT(user_property.key AS key,
        STRUCT(user_property.value.value.string_value AS string_value,
          user_property.value.value.int_value AS int_value,
          user_property.value.value.double_value AS double_value,
          user_property.value.value.float_value AS float_value,
          user_property.value.set_timestamp_usec AS set_timestamp_micros ) AS value))
  FROM
    UNNEST(user_dim.user_properties) AS user_property) AS user_properties
FROM
  `SCRIPT_GENERATED_TABLE_NAME`,
  UNNEST(event_dim) AS event

这篇关于如何用Firebase Analytics替换旧架构中的event_dim在新架构中编写Bigquery?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆