Firebase导出到BigQuery:保留群组查询 [英] Firebase exported to BigQuery: retention cohorts query

查看:372
本文介绍了Firebase导出到BigQuery:保留群组查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Firebase通过Firebase远程配置提供分割测试功能,但缺乏过滤具有用户属性的队列部分的保留(实际上具有任何属性)。



为了解决这个问题,我在寻找BigQuery,因为Firebase Analytics提供了可用的方法将数据导出到此服务。



但我坚持许多问题和谷歌没有答案或例子,可能指向我的正确方向。



一般问题:



作为第一步,我需要汇总代表相同数据的firebase群组所做的数据,因此我可以确定我的计算是正确的:

< a href =https://i.stack.imgur.com/bdpPa.png =nofollow noreferrer>



下一步应该是对查询应用限制,以便它们匹配自定义用户属性。



在这里,我到目前为止:





主要问题 - 用户计算的差异很大。有时它约为100个用户,但有时接近1000个。



这是我使用的方法:

<$ p $
##
$ b $#在指定期间(w0 - 第1周)
#计数使用`user_dim.first_open_timestamp_micros`

# firebase组用户将同时购买
#(同一天或同一周内开始使用应用)
#https://support.google.com/firebase/answer/6317510

SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id)as count
FROM

TABLE_DATE_RANGE

[admob-app-id-xx :xx_IOS.app_events_],
TIMESTAMP('2016-11-20'),
TIMESTAMP('2016-11-26')


WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros,'%Y-%m-%d')
BETWEEN'2016-11-20'和'2016-11-26'

# 2

#对于每个下一期的计数事件与
#相同first_open_timestamp
#下面是一周中的一个示例秒。
#week 0 is Nov20-Nov26,week 1 is Nov27-Dec03

SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id)as count
FROM

TABLE_DATE_RANGE

[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-27'),
TIMESTAMP ('2016-12-03')


WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros,'%Y-%m-%d')
BETWEEN '2016-11-20'和'2016-11-26'

#3

#现在我们每周都有用户w1,w2,... w5
#计算每个人的留存额
#留存率周1 = w1 / w0 * 100 = 25.72181359
#rw2 = w2 / w1 * 100
#...
#rw5 = w5 / w1 * 100

#4

#将星期0改为1并从步骤1开始重复

BigQuery查询提示请求

任何提示和指导建议复杂的查询可能会聚合并计算所有需要的数据对于这个任务在一个步骤中非常感谢。





with period_label




Firebase offer split testing functionality through Firebase remote configuration, but there are lack of ability to filter retention in cohorts sections with user properties (with any property in actual fact).

In quest of solution for this problem i'm looking for BigQuery, in reason of Firebase Analytics provide usable way to export data to this service.

But i stuck with many questions and google has no answer or example which may point me to the right direction.

General questions:

As first step i need to aggregate data which represent same data firebase cohorts do, so i can be sure my calculation is right:

Next step should be just apply constrains to the queries, so they match custom user properties.

Here what i get so far:

The main problem – big difference in users calculations. Sometimes it is about 100 users, but sometimes close to 1000.

This is approach i use:

# 1

# Count users with `user_dim.first_open_timestamp_micros` 
# in specified period (w0 – week 1)
# this is the way firebase group users to cohorts 
# (who started app on the same day or during the same week) 
# https://support.google.com/firebase/answer/6317510

SELECT
  COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
  (
   TABLE_DATE_RANGE
    (
     [admob-app-id-xx:xx_IOS.app_events_], 
     TIMESTAMP('2016-11-20'), 
     TIMESTAMP('2016-11-26')
    )
  )
WHERE
  STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
  BETWEEN '2016-11-20' AND '2016-11-26'

# 2

# For each next period count events with 
# same first_open_timestamp
# Here is example for one of the weeks. 
# week 0 is Nov20-Nov26, week 1 is Nov27-Dec03

SELECT
  COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
  (
   TABLE_DATE_RANGE
    (
     [admob-app-id-xx:xx_IOS.app_events_], 
     TIMESTAMP('2016-11-27'), 
     TIMESTAMP('2016-12-03')
    )
  )
WHERE
  STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
  BETWEEN '2016-11-20' AND '2016-11-26'

# 3

# Now we have users for each week w1, w2, ... w5
# Calculate retention for each of them
# retention week 1 = w1 / w0 * 100 = 25.72181359
# rw2 = w2 / w1 * 100
# ...
# rw5 = w5 / w1 * 100

# 4 

# Shift week 0 by one and repeat from step 1

BigQuery queries tips request

Any tips and directions to go about building complex query which may aggregate and calculate all data required for this task in one step is very appreciated.

Here is BigQuery Export schema if needed

Side questions:

  • why all the user_dim.device_info.device_id and user_dim.device_info.resettable_device_idis null?
  • user_dim.app_info.app_id is missing from the doc (if firebase support teammate will be read this question)
  • how event_dim.timestamp_micros and event_dim.previous_timestamp_micros should be used, i can not get their purpose.

PS

It will be good someone from Firebase teammate answer this question. Five month ago there are was one mention about extending cohorts functionality with filtering or show bigqueries examples, but things are not moving. Firebase Analytics is way to go they said, Google Analytics is deprecated, they said. Now i spend second day to lean bigquery and build my own solution over the existing analytics tools. I no, stack overflow is not the place for this comments, but guys are you thinking? Split testing may grammatically affect retention of my app. My app does not sold anything, funnels and events is not valuable metrics in many cases.

解决方案

Any tips and directions to go about building complex query which may aggregate and calculate all data required for this task in one step is very appreciated.

yes, generic bigquery will work fine

Below is not the most generic version, but can give you an idea
In this example I am using Stack Overflow Data available in Google BigQuery Public Datasets

First sub-select – activities – in most cases the only what you need to re-write to reflect specifics of your data.
What it does is:
a. Defines period you want to set for analysis.
In example below - it is a month - FORMAT_DATE('%Y-%m', ...
But you can use year, week, day or anything else – respectively
• By year - FORMAT_DATE('%Y', DATE(answers.creation_date)) AS period
• By week - FORMAT_DATE('%Y-%W', DATE(answers.creation_date)) AS period
• By day - FORMAT_DATE('%Y-%m-%d', DATE(answers.creation_date)) AS period
• …
b. Also it "filters" only the type of events/activity you need to analyse
for example, `WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%' looks for answers for google-bigquery tagged question

The rest of sub-queries are more-less generic and mostly can be used as is

#standardSQL
WITH activities AS (
  SELECT answers.owner_user_id AS id,
    FORMAT_DATE('%Y-%m', DATE(answers.creation_date)) AS period
  FROM `bigquery-public-data.stackoverflow.posts_answers` AS answers
  JOIN `bigquery-public-data.stackoverflow.posts_questions` AS questions
  ON questions.id = answers.parent_id
  WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%' 
  GROUP BY id, period
), cohorts AS (
  SELECT id, MIN(period) AS cohort FROM activities GROUP BY id
), periods AS (
  SELECT period, ROW_NUMBER() OVER(ORDER BY period) AS num
  FROM (SELECT DISTINCT cohort AS period FROM cohorts)
), cohorts_size AS (
  SELECT cohort, periods.num AS num, COUNT(DISTINCT activities.id) AS ids 
  FROM cohorts JOIN activities ON activities.period = cohorts.cohort AND cohorts.id = activities.id
  JOIN periods ON periods.period = cohorts.cohort
  GROUP BY cohort, num
), retention AS (
  SELECT cohort, activities.period AS period, periods.num AS num, COUNT(DISTINCT cohorts.id) AS ids
  FROM periods JOIN activities ON activities.period = periods.period
  JOIN cohorts ON cohorts.id = activities.id 
  GROUP BY cohort, period, num 
)
SELECT 
  CONCAT(cohorts_size.cohort, ' - ',  FORMAT("%'d", cohorts_size.ids), ' users') AS cohort, 
  retention.num - cohorts_size.num AS period_lag, 
  retention.period as period_label,
  ROUND(retention.ids / cohorts_size.ids * 100, 2) AS retention , retention.ids AS rids
FROM retention
JOIN cohorts_size ON cohorts_size.cohort = retention.cohort
WHERE cohorts_size.cohort >= FORMAT_DATE('%Y-%m', DATE('2015-01-01'))
ORDER BY cohort, period_lag, period_label  

You can visualize result of above query with the tool of your choice
Note: you can use either period_lag or period_label
See the difference of their use in below examples

with period_lag

with period_label

这篇关于Firebase导出到BigQuery:保留群组查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆