如何在另一个大型查询计划查询运行后立即运行大型查询计划查询 [英] How to run a big query schedule query as soon as another big query schedule query has run

查看:57
本文介绍了如何在另一个大型查询计划查询运行后立即运行大型查询计划查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在通过另一个计划大查询作业触发计划大查询作业时遇到问题.我有2个大查询表,一旦将文件放置在Google云存储桶中就会被加载.我有2个数据流作业,可触发该过程并加载大查询表.使用cloud函数和pub子主题,我触发了这2个表的计划查询.对我来说,为数据流作业创建一个过滤器非常容易,它将触发我的计划查询.例如如下

I am having issues triggering a schedule big query job through another schedule big query job. I have 2 big query tables which gets loaded once file is placed in google cloud bucket. I have 2 data flow job which triggers the process and load big query tables. Using cloud function and pub sub topic , I trigger schedule queries for these 2 tables. It was very easy for me to create a filter for a data flow job which will trigger my schedule query. for example as below

protoPayload.methodName="jobservice.jobcompleted"
protoPayload.serviceData.jobCompletedEvent.job.jobStatus.state="DONE"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.load.destinationTable.tableId="table_name"
protoPayload.serviceData.jobCompletedEvent.job.referencedTables.datasetId="big query dataset name"
protoPayload.serviceData.jobCompletedEvent.job.referencedTables.projectId="my project id"

这很好,并且计划查询已触发.我在其他桌子上也有类似的支票.

This works fine and schedule query is triggered. I have similar check for other table.

现在,我面临的问题是,一旦我之前的计划查询运行良好,我就需要触发另一个scheudle查询,这就是挑战.我唯一想到的是可以根据以下内容过滤日志

Now, the problem that I am facing is, I need to trigger another scheudle query once my previous schedule query has run fine, Here is the challenge. The only thing I can think of and can filter logs based on below

    protoPayload.methodName="jobservice.jobcompleted"
    protoPayload.serviceData.jobCompletedEvent.job.jobStatus.state="DONE"
  protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.labels.data_source_id="scheduled_query"

但是,以上内容为我提供了项目中运行的所有计划查询的日志,在其中我正在寻找特定的查询.我之前运行的计划查询有很多插入和很少的更新语句,因此那里的表ID为零.唯一唯一的是每个计划查询日志中都存在的查询,您是否认为我需要在作为日程安排查询的一部分运行的多个查询中添加最后一个查询来过滤日志,并且哪个说明日程安排查询已完成?例如下面的

However, the above gives me logs for all schedule queries running in my project where in I am looking for a specific one. My schedule query that has run earlier has many insert and few update statement and hence table id is zero there.The only unique thing is query that is present in each schedule query logs, Do you think I need to add the last query out of the multiple queries running as part of my schedule query to filter the logs and which says schedule query is completed? for example below

protoPayload.methodName="jobservice.jobcompleted"
protoPayload.serviceData.jobCompletedEvent.job.jobStatus.state="DONE"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.labels.data_source_id="scheduled_query"
protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.query="UPDATE `whr-asia-datalake-nonprod.WHR_DATALAKE.CONSUMER_EXTRACT` CEXT SET CEXT.CUST_MOBILE=If ( length(SAFE_CAST(CEXT.CUST_MOBILE AS STRING))=10,SAFE_CAST(CONCAT('91',CEXT.CUST_MOBILE) AS Numeric),CEXT.CUST_MOBILE) WHERE TRUE"

但是,上面没有提供任何日志.请告知,因为我遇到了麻烦,需要在产品中部署它.

However, the above doesn't give any logs. Please advise as I am stuck and need to deploy this in prod.

推荐答案

您可以使用计划查询pubsub通知功能来获取执行过程的更新.在Cloud Functions中,接收和处理PubSub消息并检查状态值,完成后,像您第一次一样触发计划查询.

You can use the Schedule query pubsub notification feature to get the updates on the execution process. In a Cloud Functions, receive and process the PubSub messages and check the value of the state, and when it's done, trigger your schedule query as you did the first time.

这篇关于如何在另一个大型查询计划查询运行后立即运行大型查询计划查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆