如何在新架构中编写 Bigquery,并从 Firebase 分析替换旧架构中的 event_dim? [英] how to write Bigquery in new schema with replacing event_dim in old schema from Firebase analytics?

查看:18
本文介绍了如何在新架构中编写 Bigquery,并从 Firebase 分析替换旧架构中的 event_dim?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

旧的 BigQuery Export 模式智能脚本正在运行.如下所示.但是我想复制此代码并根据新的导出模式编写它,因为我们的 Bigquery 模式已更改.请帮忙,因为在新的 BigQuery 导出架构中我没有找到任何其他相应的记录event_dim(event_dim 根据旧的 BigQuery Export 架构).

The old BigQuery Export schema wise script is running.It is given below. But I want to replicate this code and write it according to new export schema as we Bigquery schema has been changed. Please help becasue in new BigQuery Export schema I don't find any other corresponding record against event_dim (event_dim is in according to old BigQuery Export schema).

这是 BigQuery 导出架构的链接: 点击这里

 SELECT user_dim.app_info.app_instance_id
          , (SELECT MIN(timestamp_micros) FROM UNNEST(event_dim)) min_time
          , (SELECT MAX(timestamp_micros) FROM UNNEST(event_dim)) max_time,
                event.name,
                params.value.int_value engagement_time
        FROM `xxx.app_events_*`,
        UNNEST(event_dim) as event,
        UNNEST(event.params) as params,
        UNNEST(user_dim.user_properties) as user_params
        where (event.name = "user_engagement" and params.key = "engagement_time_msec")
        and
                (user_params.key = "access" and user_params.value.value.string_value = "true") and
                PARSE_DATE('%Y%m%d', event.date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
                PARSE_DATE('%Y%m%d', event.date) <= "{{upto_date (yyyy-mm-dd)}}"

尝试了下面的查询,但我想要一个 SELECT 语句中的 app_instance、min_time、max_time、event_name、engagement_time.由于我使用的是分组依据",我无法一次获得所有这些(app_instance、min_time、max_time、event_name、engagement_time).请帮忙.

Tried the query below but what I want app_instance, min_time, max_time, event_name, engagement_time at one SELECT statement. And as I am using 'group by', I am not able to get all those (app_instance, min_time, max_time, event_name, engagement_time) at a time. Please help.

 SELECT user_pseudo_id
     , MIN(event_timestamp) AS min_time
      ,MAX(event_timestamp) AS max_time
    FROM `xxx.app_events_*` as T,
       T.event_params,
       T.user_properties,
       T.event_timestamp
    where (event_name = "user_engagement" and event_params.key = "engagement_time_msec")
    and
            (user_properties.key = "access" and user_properties.value.string_value = "true") and
            PARSE_DATE('%Y%m%d', event_date) >= date_sub("{{upto_date (yyyy-mm-dd)}}", interval {{last n days}} day) and
            PARSE_DATE('%Y%m%d', event_date) <= "{{upto_date (yyyy-mm-dd)}}"
    group by 1

推荐答案

Google Analytics for Firebase BigQuery 导出.尽管与新字段相比,旧字段没有明确的映射,但文档中提供的 SQL 查询是为了 将现有的 BQ 数据集从旧模式迁移到新模式 提供了一些关于这些字段如何变化的提示.

It is true that there was a schema change in the Google Analytics for Firebase BigQuery Export. Although there is no clear mapping of the old fields as compared to the new ones, the SQL query that is provided in the documentation in order to migrate existing BQ datasets from the old schema to the new one provides some hints of how have these fields changed.

我在下面分享了 migration_script.sql SQL 查询,仅供参考,但让我指出与您的用例最相关的更改:

I share the migration_script.sql SQL query below, just for reference, but let me pin-point the most relevant changes for your use-case:

  • event_dim 在 SQL 查询中被映射为 event,但在模式中没有任何最终表示,因为 event_dim 不再是嵌套字段:UNNEST(event_dim) AS event
  • event_dim.timestamp_micros 映射为 event_timestamp:event.timestamp_micros AS event_timestamp
  • event_dim.name 映射为 event_name:event.name AS event_name
  • event_param.value.int_value 映射为 event_params.value.int_value:event_param.value.int_value AS int_value
  • user_dim.user_properties 被映射为 user_properties,并且它的所有嵌套值都遵循相同的结构:<代码>UNNEST(user_dim.user_properties) A​​S user_property) AS user_properties
  • event_dim is mapped as event in the SQL query, but does not have any final representation in the schema, because event_dim is no longer a nested field: UNNEST(event_dim) AS event
  • event_dim.timestamp_micros is mapped as event_timestamp: event.timestamp_micros AS event_timestamp
  • event_dim.name is mapped as event_name: event.name AS event_name
  • event_param.value.int_value is mapped as event_params.value.int_value: event_param.value.int_value AS int_value
  • user_dim.user_properties is mapped as user_properties, and all its nested values follow the same structure: UNNEST(user_dim.user_properties) AS user_property) AS user_properties

所以,总而言之,为了简单起见,架构更改的重点是取消几个字段的嵌套,例如,不必访问 event_dim.name(这将需要取消嵌套并使查询复杂化),您可以直接查询字段event_name.

So, in summary, the schema change has been focused at unnesting several of the fields for simplicity, in such a way that, for example, instead of having to access event_dim.name (which would require unnesting and complicating the query), you can query directly the field event_name.

考虑到这一点,我相信您将能够使您的查询适应这个新模式,而且它可能看起来更简单,因为您不必取消嵌套这么多字段.

Having this in mind, I am sure you will be able to adapt your query to this new schema, and it will probably look way more simple, given that you will not have to unnest so many fields.

为了澄清起见,让我与您分享几个比较旧架构和新架构的示例 BQ 查询(它们使用公共 Firebase 表,因此您应该能够开箱即用地运行它们):

Just for clarification, let me share with you a couple of sample BQ queries comparing the old and the new schema (they are using public Firebase tables, so you should be able to run them out-of-the-box):

# Old Schema - UNNEST() required because there are nested fields
SELECT
  user_dim.app_info.app_instance_id,
  MIN(event.timestamp_micros) AS min_time,
  MAX(event.timestamp_micros) AS max_time,
  event.name
FROM
  `firebase-public-project.com_firebase_demo_ANDROID.app_events_20180503`,
  UNNEST(event_dim) AS event
WHERE
  event.name = "user_engagement"
GROUP BY
  user_dim.app_info.app_instance_id,
  event.name

相比:

# New Schema - UNNEST() not required because there are no nested fields
SELECT
  user_pseudo_id,
  MIN(event_timestamp) AS min_time,
  MAX(event_timestamp) AS max_time,
  event_name
FROM
  `firebase-public-project.analytics_153293282.events_20180815`
WHERE
  event_name = "user_engagement"
GROUP BY
  user_pseudo_id,
  event_name

这些查询是等效的,但引用具有旧模式和新模式的表.请注意,由于您的查询更复杂,您可能需要添加一些 UNNEST() 以访问表中剩余的嵌套字段.

These queries are equivalent, but referencing tables with the old and new schema. Please note that, as your query is more complex, you may need to add some UNNEST() in order to access the remaining nested fields in the table.

此外,您可能想看看这些示例可以帮助您了解如何使用新架构编写查询的一些想法.

Additionally, you may want to have a look at these samples that can help you with some ideas on how to write queries with the new schema.

编辑 2

我的理解是,像下面这样的查询应该允许您在单个语句中查询所有字段.我按所有非聚合/过滤字段进行分组,但根据您的用例(这绝对是您需要自己工作的东西),您可能希望应用不同的策略以便能够查询非- 分组字段(即使用最小/最大过滤器等).

My understanding is that a query like the one below should allow you to query for all the fields in a single statement. I am grouping by all the non-aggregated/filtered fields, but depending on your use case (this is definitely something you would need to work on your own) you may want to apply a different strategy in order to be able to query the non-grouped fields (i.e. use a MIN/MAX filter, etc.).

SELECT
  user_pseudo_id,
  MIN(event_timestamp) AS min_time,
  MAX(event_timestamp) AS max_time,
  event_name,
  par.value.int_value AS engagement_time
FROM
  `firebase-public-project.analytics_153293282.events_20180815`,
  UNNEST(event_params) as par
WHERE
  event_name = "user_engagement" AND par.key = "engagement_time_msec"
GROUP BY
  user_pseudo_id,
  event_name,
  par.value.int_value

<小时>

附件

migration_script.sql:

  SELECT
  @date AS event_date,
  event.timestamp_micros AS event_timestamp,
  event.previous_timestamp_micros AS event_previous_timestamp,
  event.name AS event_name,
  event.value_in_usd  AS event_value_in_usd,
   user_dim.bundle_info.bundle_sequence_id AS event_bundle_sequence_id,
  user_dim.bundle_info.server_timestamp_offset_micros as event_server_timestamp_offset,
  (
  SELECT
    ARRAY_AGG(STRUCT(event_param.key AS key,
        STRUCT(event_param.value.string_value AS string_value,
          event_param.value.int_value AS int_value,
          event_param.value.double_value AS double_value, 
          event_param.value.float_value AS float_value) AS value))
  FROM
    UNNEST(event.params) AS event_param) AS event_params,
  user_dim.first_open_timestamp_micros AS user_first_touch_timestamp,
  user_dim.user_id AS user_id,
  user_dim.app_info.app_instance_id AS user_pseudo_id,
  "" AS stream_id,
  user_dim.app_info.app_platform AS platform,
  STRUCT( user_dim.ltv_info.revenue AS revenue,
    user_dim.ltv_info.currency AS currency ) AS user_ltv,
  STRUCT( user_dim.traffic_source.user_acquired_campaign AS name,
      user_dim.traffic_source.user_acquired_medium AS medium,
      user_dim.traffic_source.user_acquired_source AS source ) AS traffic_source,
  STRUCT( user_dim.geo_info.continent AS continent,
    user_dim.geo_info.country AS country,
    user_dim.geo_info.region AS region,
    user_dim.geo_info.city AS city ) AS geo,
  STRUCT( user_dim.device_info.device_category AS category,
    user_dim.device_info.mobile_brand_name,
    user_dim.device_info.mobile_model_name,
    user_dim.device_info.mobile_marketing_name,
    user_dim.device_info.device_model AS mobile_os_hardware_model,
    @platform AS operating_system,
    user_dim.device_info.platform_version AS operating_system_version,
    user_dim.device_info.device_id AS vendor_id,
    user_dim.device_info.resettable_device_id AS advertising_id,
    user_dim.device_info.user_default_language AS language,
    user_dim.device_info.device_time_zone_offset_seconds AS time_zone_offset_seconds,
    IF(user_dim.device_info.limited_ad_tracking, "Yes", "No") AS is_limited_ad_tracking ) AS device,
  STRUCT( user_dim.app_info.app_id AS id,
    @firebase_app_id  AS firebase_app_id,
    user_dim.app_info.app_version AS version,
    user_dim.app_info.app_store AS install_source ) AS app_info,
  (
  SELECT
    ARRAY_AGG(STRUCT(user_property.key AS key,
        STRUCT(user_property.value.value.string_value AS string_value,
          user_property.value.value.int_value AS int_value,
          user_property.value.value.double_value AS double_value,
          user_property.value.value.float_value AS float_value,
          user_property.value.set_timestamp_usec AS set_timestamp_micros ) AS value))
  FROM
    UNNEST(user_dim.user_properties) AS user_property) AS user_properties
FROM
  `SCRIPT_GENERATED_TABLE_NAME`,
  UNNEST(event_dim) AS event

这篇关于如何在新架构中编写 Bigquery,并从 Firebase 分析替换旧架构中的 event_dim?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆