使用暂存模板多次部署时，Dataflow 作业使用相同的 BigQuery 作业 ID? [英] Dataflow job uses same BigQuery job ID when deploying using a staged template multiple times?

查看：22 发布时间：2021/11/11 22:41:34 java google-bigquery google-cloud-dataflow apache-beam

本文介绍了使用暂存模板多次部署时，Dataflow 作业使用相同的 BigQuery 作业 ID?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试部署一个 Dataflow 作业，该作业从 BigQuery 读取并按固定时间表写入 Cassandra.模板代码是使用 Apache Beam 和 Dataflow 库用 Java 编写的.我已将模板上演到 Google Cloud Storage，并配置了 Cloud Scheduler 实例以及用于触发 Dataflow 模板的 Cloud 函数.我为所有 Beam 和 BigQuery 依赖项使用最新版本.

I am attempting to deploy a Dataflow job that reads from BigQuery and writes to Cassandra on a fixed schedule. The template code has been written in Java using Apache Beam, and the Dataflow library. I have staged the template onto Google Cloud Storage, and have configured a Cloud Scheduler instance as well as Cloud function used to trigger the Dataflow template. I am using the latest version for all Beam and BigQuery dependencies.

但是，我发现在使用相同的暂存模板部署作业时，BigQuery 提取作业似乎总是使用相同的作业 ID，这会导致日志中显示 409 失败.BigQuery 查询作业似乎是成功的，因为查询作业 ID 附加了唯一后缀，而提取作业 ID 使用相同的前缀，但没有后缀.

However, I have discovered that when deploying a job using the same staged template, the BigQuery extract job seems to always use the same job ID, which causes a 409 failure shown in the logs. The BigQuery query job seems to be successful, because the query job ID has a unique suffix appended, while the extract job ID uses the same prefix, but without a suffix.

我考虑了两种替代解决方案:使用 crontab 直接在计算引擎实例上部署管道以直接部署模板，或者调整 Cloud 函数以按计划执行与 Dataflow 管道相同的任务.理想情况下，如果有更改 Dataflow 作业中提取作业 ID 的解决方案，这将是一个更简单的解决方案，但我不确定这是否可行?此外，如果这是不可能的，是否有更优化的替代解决方案?

I have considered two alternate solutions: either using a crontab to deploy the pipeline directly on a compute engine instance to deploy the template directly, or adapting a Cloud function to perform the same tasks as the Dataflow pipeline on a schedule. Ideally, if there is a solution for changing the extract job ID in the Dataflow job it would be a much easier solution but I'm not sure if this is possible? Also if this is not possible, is there an alternate solution that is more optimal?

使用暂存模板多次部署时，Dataflow 作业使用相同的 BigQuery 作业 ID? [英] Dataflow job uses same BigQuery job ID when deploying using a staged template multiple times?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用暂存模板多次部署时，Dataflow 作业使用相同的 BigQuery 作业 ID? [英] Dataflow job uses same BigQuery job ID when deploying using a staged template multiple times?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭