Dataflow 模板是否支持 BigQuery 接收器选项的模板输入? [英] Does Dataflow templating supports template input for BigQuery sink options?

查看：21 发布时间：2021/11/11 22:30:59 python google-cloud-dataflow apache-beam

本文介绍了Dataflow 模板是否支持 BigQuery 接收器选项的模板输入?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

因为我有一个正在运行的静态 Dataflow，我想从这个模板创建一个模板，让我可以轻松地重用 Dataflow，而无需输入任何命令行.

As I have a working static Dataflow running, I'd like to create a template from this one to let me easily reuse the Dataflow without any command line typing.

遵循官方的创建模板教程不会提供可模板化输出的示例.

Following the Creating Templates tutorial from the official doesn't provide a sample for templatable output.

我的数据流以 BigQuery 接收器结束，它接受一些参数，例如用于存储的目标表.这个确切的参数是我想在模板中提供的参数，允许我在运行流程后选择目标存储.

My Dataflow ends with a BigQuery sink which takes a few arguments like the target table for storage. This exact parameter is the one I'd like to make available in my template allowing me to choose the target storage after running the flow.

但是，我无法使其正常工作.下面我粘贴了一些代码片段，它们可以帮助解释我遇到的确切问题.

But, I'm not able to get this working. Below I paste some code snippets which could help explaining the exact issue I have.

class CustomOptions(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument(
            '--input',
            default='gs://my-source-bucket/file.json')
        parser.add_value_provider_argument(
            '--table',
            default='my-project-id:some-dataset.some-table')

pipeline_options = PipelineOptions()

pipe = beam.Pipeline(options=pipeline_options)

custom_options = pipeline_options.view_as(CustomOptions)

(...)

# store
processed_pipe | beam.io.Write(BigQuerySink(
    table=custom_options.table.get(),
    schema='a_column:STRING,b_column:STRING,etc_column:STRING',
    create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
    write_disposition=BigQueryDisposition.WRITE_APPEND
))

创建模板时，我没有给它任何参数.一瞬间，我收到以下错误消息:

When creating the template, I did not give any parameters with it. In a split second I get the following error message:

apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: table, type: str, default_value: 'my-project-id:some-dataset.some-table').get() 未从运行时上下文调用

当我在模板创建时添加 --table 参数时，正在创建模板，但 --table 参数值然后被硬编码在模板中而不被覆盖稍后通过 table 的任何给定模板值.

When I add a --table parameter at template creation, the template is being created but the --table parameter value is then hardcoded in the template and not overridden by any given template value for table later.

当我将 table=custom_options.table.get(), 替换为 table=StaticValueProvider(str, custom_options.table.get()) 时，我遇到了同样的错误.

是否有人已经使用可自定义的 BigQuerySink 参数构建了可模板化的数据流?我很想得到一些提示.

Is there someone who already built a templatable Dataflow with customisable BigQuerySink parameters? I'd love to get some hints on this.

Dataflow 模板是否支持 BigQuery 接收器选项的模板输入? [英] Does Dataflow templating supports template input for BigQuery sink options?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Dataflow 模板是否支持 BigQuery 接收器选项的模板输入? [英] Does Dataflow templating supports template input for BigQuery sink options?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭