我是否可以在BigQuery中可靠地查询Firebase日内表格并获取100%的事件数据? [英] Can I reliably query the Firebase intraday tables in BigQuery and get 100% of the event data?
问题描述
我有两个Firebase项目(一个iOS和一个Android)进入Bigquery。我需要将来自两个项目的某些特定数据合并,拼合并汇总到一个组合表中,以便在不查询所有日常表中的所有bazillion行的情况下报告它。
<为了填充这个聚合表,我目前有两个python脚本,每5分钟查询iOS和Android日内表。该脚本从聚合表中获取最大时间戳,然后查询intraday表以获取具有更大时间戳记的任何记录(我分别跟踪iOS和Android的最大时间戳,因为它们经常不同)。
我正在用这个(缩写)通配符语法查询intraday表:
SELECT yadda,yadda,timestamp_micros ,'ios'as platform
FROM`myproject.iOSapp.app_events_intraday *`
WHERE timestamp_micros> (从myAggregateTable中选择max(timestamp_micros)
WHERE platform ='ios')
当日内表翻转到新的一天时,如果我的脚本在23:57和00:02再次运行,我会错过任何记录?
我想我会在几个月后发布我的测试结果。以下是我看到的基本机制:
- 新
DAY1
intraday表是在午夜GMT创建(xyz.app_events_intraday_20180101
) - 新
DAY2
盘中表格在24小时后创建(xyz.app_events_intraday_20180102
),但DAY1
盘中表格坚持了几个小时 > - 最后,
DAY1
表被重命名为xyz.app_events_20180101
, (当前)日内表
我的测试表明,额外的数据被添加到 app_events _ *
表,即使在步骤3发生之后,所以假设数据一旦名称改变就是稳定/静态的,那么它是 NOT 。我有新的数据出现2或3天后。
I have two Firebase projects (one iOS and one Android) feeding into Bigquery. I need to combine, flatten, and aggregate some specific data from both projects into one combined table so that I can report off of it without querying all bazillion rows across all daily tables.
In order to populate this aggregate table, I currently have two python scripts querying the iOS and Android intraday tables every 5 minutes. The script gets the max timestamp from the aggregate table, then queries the intraday table to get any records with a greater timestamp (I track the max timestamp separately for iOS and Android because they frequently differ).
I am querying the intraday table with this (abbreviated) wildcard syntax:
SELECT yadda, yadda, timestamp_micros, 'ios' as platform
FROM `myproject.iOSapp.app_events_intraday*`
WHERE timestamp_micros > (Select max(timestamp_micros)
from myAggregateTable WHERE platform = 'ios' )
Is there any danger that when the intraday table flips over to the new day, I will miss any records when my script runs at 23:57 and then again at 00:02?
I thought I would post the results of my testing this for a few months. Here are the basic mechanics as I see them:
- New
DAY1
intraday table is created at midnight GMT (xyz.app_events_intraday_20180101
) - New
DAY2
intraday table is created 24 hours later (xyz.app_events_intraday_20180102
), butDAY1
intraday table sticks around for a few hours - Eventually,
DAY1
table is "renamed" toxyz.app_events_20180101
and you are left with a single (current) intraday table
My tests have shown that additional data is added to the app_events_*
tables, even after step 3 has taken place, so it is NOT safe to assume that the data is stable/static once the name has changed. I have new data appear up to 2 or 3 days later.
这篇关于我是否可以在BigQuery中可靠地查询Firebase日内表格并获取100%的事件数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!