我是否可以在BigQuery中可靠地查询Firebase日内表格并获取100%的事件数据? [英] Can I reliably query the Firebase intraday tables in BigQuery and get 100% of the event data?

查看:235
本文介绍了我是否可以在BigQuery中可靠地查询Firebase日内表格并获取100%的事件数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个Firebase项目(一个iOS和一个Android)进入Bigquery。我需要将来自两个项目的某些特定数据合并,拼合并汇总到一个组合表中,以便在不查询所有日常表中的所有bazillion行的情况下报告它。



<为了填充这个聚合表,我目前有两个python脚本,每5分钟查询iOS和Android日内表。该脚本从聚合表中获取最大时间戳,然后查询intraday表以获取具有更大时间戳记的任何记录(我分别跟踪iOS和Android的最大时间戳,因为它们经常不同)。



我正在用这个(缩写)通配符语法查询intraday表:

  SELECT yadda,yadda,timestamp_micros ,'ios'as platform 
FROM`myproject.iOSapp.app_events_intraday *`
WHERE timestamp_micros> (从myAggregateTable中选择max(timestamp_micros)
WHERE platform ='ios')

当日内表翻转到新的一天时,如果我的脚本在23:57和00:02再次运行,我会错过任何记录?

解决方案

我想我会在几个月后发布我的测试结果。以下是我看到的基本机制:


  1. DAY1 intraday表是在午夜GMT创建( xyz.app_events_intraday_20180101

  2. DAY2 盘中表格在24小时后创建( xyz.app_events_intraday_20180102 ),但 DAY1 盘中表格坚持了几个小时
  3. >
  4. 最后, DAY1 表被重命名为 xyz.app_events_20180101 , (当前)日内表

我的测试表明,额外的数据被添加到 app_events _ * 表,即使在步骤3发生之后,所以假设数据一旦名称改变就是稳定/静态的,那么它是 NOT 。我有新的数据出现2或3天后。


I have two Firebase projects (one iOS and one Android) feeding into Bigquery. I need to combine, flatten, and aggregate some specific data from both projects into one combined table so that I can report off of it without querying all bazillion rows across all daily tables.

In order to populate this aggregate table, I currently have two python scripts querying the iOS and Android intraday tables every 5 minutes. The script gets the max timestamp from the aggregate table, then queries the intraday table to get any records with a greater timestamp (I track the max timestamp separately for iOS and Android because they frequently differ).

I am querying the intraday table with this (abbreviated) wildcard syntax:

SELECT yadda, yadda, timestamp_micros, 'ios' as platform
FROM `myproject.iOSapp.app_events_intraday*`
WHERE timestamp_micros > (Select max(timestamp_micros) 
                          from myAggregateTable WHERE platform = 'ios' )

Is there any danger that when the intraday table flips over to the new day, I will miss any records when my script runs at 23:57 and then again at 00:02?

解决方案

I thought I would post the results of my testing this for a few months. Here are the basic mechanics as I see them:

  1. New DAY1 intraday table is created at midnight GMT (xyz.app_events_intraday_20180101)
  2. New DAY2 intraday table is created 24 hours later (xyz.app_events_intraday_20180102), but DAY1 intraday table sticks around for a few hours
  3. Eventually, DAY1 table is "renamed" to xyz.app_events_20180101 and you are left with a single (current) intraday table

My tests have shown that additional data is added to the app_events_* tables, even after step 3 has taken place, so it is NOT safe to assume that the data is stable/static once the name has changed. I have new data appear up to 2 or 3 days later.

这篇关于我是否可以在BigQuery中可靠地查询Firebase日内表格并获取100%的事件数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆