逐步将SQL迁移到BigQuery [英] Cloud SQL to BigQuery incrementally

本文介绍了逐步将SQL迁移到BigQuery的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我正在使用的一种用例,我需要一些建议.

I need some suggestions for one of the use cases I am working on.

用例:

我们在Cloud SQL中有大约5到10个表的数据,其中一些被视为查找,而其他则被视为事务性.我们需要以某种方式将其发送到BigQuery,以便从其中生成3-4个表(平整,嵌套或非规范化表),这些表将用于Data Studio,Looker等中的报表.

We have data in Cloud SQL around 5-10 tables, some are treated as lookup and others transactional. We need to get this to BigQuery in a way to make 3-4 tables(Flattened, Nested or Denormalized) out of these which will be used for reporting in Data Studio, Looker, etc.

应该对数据进行增量处理,并且每5分钟可能会发生一次Cloud SQL更改,这意味着BigQuery报表应该几乎可以实时获取数据.如果可能的话,我们还应该保留数据更改的历史记录.

Data should be processed incrementally and changes in Cloud SQL could happen every 5 min, which means that data should be available to BigQuery reporting in near real-time. If possible we should also maintain the history of Data change.

解决方案:

我们在BigQuery中保留了3层,因此来自Cloud SQL的数据进入第一层,然后经过展平后将其保存到第二层(保留历史记录),然后进行处理以插入到第三层(报告) . 问题是因为该方法是为日常处理而开发的,无助于每5分钟更改一次Cloud SQL中的数据更改.

We kept 3 layers in BigQuery, so data from Cloud SQL comes to the first layer, then after flattening we save it to the second layer (which maintains history), and then processes it to insert in the third layer(Reporting). The problem with this approach is that it was developed for daily processing and can not help in processing every 5 min change of data in Cloud SQL.

欢迎任何建议.

注意:-,我们将要求使用Dataflow,VM或其他方式进行某些逻辑处理,这意味着我们不能仅使用计划查询,因为我们不能使用查询来处理所有逻辑,但是可以的,可以混合使用视图和代码.

Note:- We would require to use Dataflow, VM or some other way to do some logic processing, means we can not use only scheduled query as we can not use query to process all logic, but yes mix of view and code can be used.

推荐答案

这比您想象的要简单:BigQuery可以直接从Cloud SQL中读取数据.

This is more straightforward than you might think: BigQuery can read data straight out of your Cloud SQL.

文档:

更多阅读:

因此,您所需要做的就是编写一个MySQL查询,以查找最近5分钟内的所有更改.您可以在BigQuery中运行此查询-并使用以下查询将所有数据具体化为BigQuery:

So all you need to do is write a MySQL query that finds all the changes from the last 5 minutes. You can run this query from within BigQuery - and materialize all the data into BigQuery with one query like:

INSERT INTO `dataset.table`

SELECT * 
FROM EXTERNAL_QUERY(
  'your.CloudSql.database'
  , 'SELECT * FROM users')

这篇关于逐步将SQL迁移到BigQuery的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆