多次使用暂存模板进行部署时，数据流作业使用相同的BigQuery作业ID? [英] Dataflow job uses same BigQuery job ID when deploying using a staged template multiple times?

查看：38 发布时间：2021/4/7 20:57:14 java google-bigquery google-cloud-dataflow apache-beam

本文介绍了多次使用暂存模板进行部署时，数据流作业使用相同的BigQuery作业ID?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试部署一个从BigQuery读取并按固定时间表写入Cassandra的Dataflow作业.模板代码已使用Apache Beam和Dataflow库用Java编写.我已将该模板上载到Google Cloud Storage，并配置了Cloud Scheduler实例以及用于触发Dataflow模板的Cloud函数.我正在为所有Beam和BigQuery依赖项使用最新版本.

I am attempting to deploy a Dataflow job that reads from BigQuery and writes to Cassandra on a fixed schedule. The template code has been written in Java using Apache Beam, and the Dataflow library. I have staged the template onto Google Cloud Storage, and have configured a Cloud Scheduler instance as well as Cloud function used to trigger the Dataflow template. I am using the latest version for all Beam and BigQuery dependencies.

但是，我发现使用相同的暂存模板部署作业时，BigQuery提取作业似乎总是使用相同的作业ID，这会导致日志中显示409错误.BigQuery查询作业似乎成功，因为查询作业ID附加了唯一的后缀，而提取作业ID使用相同的前缀，但没有后缀.

However, I have discovered that when deploying a job using the same staged template, the BigQuery extract job seems to always use the same job ID, which causes a 409 failure shown in the logs. The BigQuery query job seems to be successful, because the query job ID has a unique suffix appended, while the extract job ID uses the same prefix, but without a suffix.

我已经考虑了两种替代解决方案:要么使用crontab将管道直接部署在计算引擎实例上以直接部署模板，要么使用Cloud函数来按计划执行与Dataflow管道相同的任务.理想情况下，如果有解决方案来更改Dataflow作业中的提取作业ID，那将是一个简单得多的解决方案，但我不确定是否可行?另外，如果这不可能，是否有更理想的替代解决方案?

I have considered two alternate solutions: either using a crontab to deploy the pipeline directly on a compute engine instance to deploy the template directly, or adapting a Cloud function to perform the same tasks as the Dataflow pipeline on a schedule. Ideally, if there is a solution for changing the extract job ID in the Dataflow job it would be a much easier solution but I'm not sure if this is possible? Also if this is not possible, is there an alternate solution that is more optimal?

多次使用暂存模板进行部署时，数据流作业使用相同的BigQuery作业ID? [英] Dataflow job uses same BigQuery job ID when deploying using a staged template multiple times?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

多次使用暂存模板进行部署时，数据流作业使用相同的BigQuery作业ID? [英] Dataflow job uses same BigQuery job ID when deploying using a staged template multiple times?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭