Firebase导出到BigQuery:保留群组查询 [英] Firebase exported to BigQuery: retention cohorts query
问题描述
为了解决这个问题,我在寻找BigQuery,因为Firebase Analytics提供了可用的方法将数据导出到此服务。
但我坚持许多问题和谷歌没有答案或例子,可能指向我的正确方向。
一般问题:
作为第一步,我需要汇总代表相同数据的firebase群组所做的数据,因此我可以确定我的计算是正确的:
< a href =https://i.stack.imgur.com/bdpPa.png =nofollow noreferrer>
下一步应该是对查询应用限制,以便它们匹配自定义用户属性。
在这里,我到目前为止:
主要问题 - 用户计算的差异很大。有时它约为100个用户,但有时接近1000个。
这是我使用的方法:
<$ p $
##
$ b $#在指定期间(w0 - 第1周)
#计数使用`user_dim.first_open_timestamp_micros`
#
# firebase组用户将同时购买
#(同一天或同一周内开始使用应用)
#https://support.google.com/firebase/answer/6317510
SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id)as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx :xx_IOS.app_events_],
TIMESTAMP('2016-11-20'),
TIMESTAMP('2016-11-26')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros,'%Y-%m-%d')
BETWEEN'2016-11-20'和'2016-11-26'
# 2
#对于每个下一期的计数事件与
#相同first_open_timestamp
#下面是一周中的一个示例秒。
#week 0 is Nov20-Nov26,week 1 is Nov27-Dec03
SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id)as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-27'),
TIMESTAMP ('2016-12-03')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros,'%Y-%m-%d')
BETWEEN '2016-11-20'和'2016-11-26'
#3
#现在我们每周都有用户w1,w2,... w5
#计算每个人的留存额
#留存率周1 = w1 / w0 * 100 = 25.72181359
#rw2 = w2 / w1 * 100
#...
#rw5 = w5 / w1 * 100
#4
#将星期0改为1并从步骤1开始重复
BigQuery查询提示请求
任何提示和指导建议复杂的查询可能会聚合并计算所有需要的数据对于这个任务在一个步骤中非常感谢。
with period_label
Firebase offer split testing functionality through Firebase remote configuration, but there are lack of ability to filter retention in cohorts sections with user properties (with any property in actual fact).
In quest of solution for this problem i'm looking for BigQuery, in reason of Firebase Analytics provide usable way to export data to this service.
But i stuck with many questions and google has no answer or example which may point me to the right direction.
General questions:
As first step i need to aggregate data which represent same data firebase cohorts do, so i can be sure my calculation is right:
Next step should be just apply constrains to the queries, so they match custom user properties.
Here what i get so far:
The main problem – big difference in users calculations. Sometimes it is about 100 users, but sometimes close to 1000.
This is approach i use:
# 1
# Count users with `user_dim.first_open_timestamp_micros`
# in specified period (w0 – week 1)
# this is the way firebase group users to cohorts
# (who started app on the same day or during the same week)
# https://support.google.com/firebase/answer/6317510
SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-20'),
TIMESTAMP('2016-11-26')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
BETWEEN '2016-11-20' AND '2016-11-26'
# 2
# For each next period count events with
# same first_open_timestamp
# Here is example for one of the weeks.
# week 0 is Nov20-Nov26, week 1 is Nov27-Dec03
SELECT
COUNT(DISTINCT user_dim.app_info.app_instance_id) as count
FROM
(
TABLE_DATE_RANGE
(
[admob-app-id-xx:xx_IOS.app_events_],
TIMESTAMP('2016-11-27'),
TIMESTAMP('2016-12-03')
)
)
WHERE
STRFTIME_UTC_USEC(user_dim.first_open_timestamp_micros, '%Y-%m-%d')
BETWEEN '2016-11-20' AND '2016-11-26'
# 3
# Now we have users for each week w1, w2, ... w5
# Calculate retention for each of them
# retention week 1 = w1 / w0 * 100 = 25.72181359
# rw2 = w2 / w1 * 100
# ...
# rw5 = w5 / w1 * 100
# 4
# Shift week 0 by one and repeat from step 1
BigQuery queries tips request
Any tips and directions to go about building complex query which may aggregate and calculate all data required for this task in one step is very appreciated.
Here is BigQuery Export schema if needed
Side questions:
- why all the
user_dim.device_info.device_id
anduser_dim.device_info.resettable_device_id
isnull
? user_dim.app_info.app_id
is missing from the doc (if firebase support teammate will be read this question)- how
event_dim.timestamp_micros
andevent_dim.previous_timestamp_micros
should be used, i can not get their purpose.
PS
It will be good someone from Firebase teammate answer this question. Five month ago there are was one mention about extending cohorts functionality with filtering or show bigqueries examples, but things are not moving. Firebase Analytics is way to go they said, Google Analytics is deprecated, they said. Now i spend second day to lean bigquery and build my own solution over the existing analytics tools. I no, stack overflow is not the place for this comments, but guys are you thinking? Split testing may grammatically affect retention of my app. My app does not sold anything, funnels and events is not valuable metrics in many cases.
Any tips and directions to go about building complex query which may aggregate and calculate all data required for this task in one step is very appreciated.
yes, generic bigquery will work fine
Below is not the most generic version, but can give you an idea
In this example I am using Stack Overflow Data available in Google BigQuery Public Datasets
First sub-select – activities – in most cases the only what you need to re-write to reflect specifics of your data.
What it does is:
a. Defines period you want to set for analysis.
In example below - it is a month - FORMAT_DATE('%Y-%m', ...
But you can use year, week, day or anything else – respectively
• By year - FORMAT_DATE('%Y', DATE(answers.creation_date)) AS period
• By week - FORMAT_DATE('%Y-%W', DATE(answers.creation_date)) AS period
• By day - FORMAT_DATE('%Y-%m-%d', DATE(answers.creation_date)) AS period
• …
b. Also it "filters" only the type of events/activity you need to analyse
for example, `WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%' looks for answers for google-bigquery tagged question
The rest of sub-queries are more-less generic and mostly can be used as is
#standardSQL
WITH activities AS (
SELECT answers.owner_user_id AS id,
FORMAT_DATE('%Y-%m', DATE(answers.creation_date)) AS period
FROM `bigquery-public-data.stackoverflow.posts_answers` AS answers
JOIN `bigquery-public-data.stackoverflow.posts_questions` AS questions
ON questions.id = answers.parent_id
WHERE CONCAT('|', questions.tags, '|') LIKE '%|google-bigquery|%'
GROUP BY id, period
), cohorts AS (
SELECT id, MIN(period) AS cohort FROM activities GROUP BY id
), periods AS (
SELECT period, ROW_NUMBER() OVER(ORDER BY period) AS num
FROM (SELECT DISTINCT cohort AS period FROM cohorts)
), cohorts_size AS (
SELECT cohort, periods.num AS num, COUNT(DISTINCT activities.id) AS ids
FROM cohorts JOIN activities ON activities.period = cohorts.cohort AND cohorts.id = activities.id
JOIN periods ON periods.period = cohorts.cohort
GROUP BY cohort, num
), retention AS (
SELECT cohort, activities.period AS period, periods.num AS num, COUNT(DISTINCT cohorts.id) AS ids
FROM periods JOIN activities ON activities.period = periods.period
JOIN cohorts ON cohorts.id = activities.id
GROUP BY cohort, period, num
)
SELECT
CONCAT(cohorts_size.cohort, ' - ', FORMAT("%'d", cohorts_size.ids), ' users') AS cohort,
retention.num - cohorts_size.num AS period_lag,
retention.period as period_label,
ROUND(retention.ids / cohorts_size.ids * 100, 2) AS retention , retention.ids AS rids
FROM retention
JOIN cohorts_size ON cohorts_size.cohort = retention.cohort
WHERE cohorts_size.cohort >= FORMAT_DATE('%Y-%m', DATE('2015-01-01'))
ORDER BY cohort, period_lag, period_label
You can visualize result of above query with the tool of your choice
Note: you can use either period_lag or period_label
See the difference of their use in below examples
with period_lag
with period_label
这篇关于Firebase导出到BigQuery:保留群组查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!