数据流模板是否支持BigQuery接收器选项的模板输入? [英] Does Dataflow templating supports template input for BigQuery sink options?

查看：81 发布时间：2020/9/3 4:58:46 python google-cloud-dataflow apache-beam

本文介绍了数据流模板是否支持BigQuery接收器选项的模板输入?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

由于我正在运行一个正常工作的静态数据流，因此我想以此模板创建一个模板，以使我可以轻松地重用数据流，而无需键入任何命令行.

As I have a working static Dataflow running, I'd like to create a template from this one to let me easily reuse the Dataflow without any command line typing.

按照官方的创建模板教程进行操作提供可模板输出的样本.

Following the Creating Templates tutorial from the official doesn't provide a sample for templatable output.

我的数据流"以BigQuery接收器结尾，该接收器接受一些参数(例如目标表)进行存储.这个确切的参数是我想在模板中使用的参数，允许我在运行流程之后选择目标存储.

My Dataflow ends with a BigQuery sink which takes a few arguments like the target table for storage. This exact parameter is the one I'd like to make available in my template allowing me to choose the target storage after running the flow.

但是，我无法正常工作.在下面粘贴一些代码片段，这些片段可以帮助解释我遇到的确切问题.

But, I'm not able to get this working. Below I paste some code snippets which could help explaining the exact issue I have.

class CustomOptions(PipelineOptions):
    @classmethod
    def _add_argparse_args(cls, parser):
        parser.add_value_provider_argument(
            '--input',
            default='gs://my-source-bucket/file.json')
        parser.add_value_provider_argument(
            '--table',
            default='my-project-id:some-dataset.some-table')

pipeline_options = PipelineOptions()

pipe = beam.Pipeline(options=pipeline_options)

custom_options = pipeline_options.view_as(CustomOptions)

(...)

# store
processed_pipe | beam.io.Write(BigQuerySink(
    table=custom_options.table.get(),
    schema='a_column:STRING,b_column:STRING,etc_column:STRING',
    create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
    write_disposition=BigQueryDisposition.WRITE_APPEND
))

在创建模板时，我没有提供任何参数.一秒钟后，我收到以下错误消息:

When creating the template, I did not give any parameters with it. In a split second I get the following error message:

apache_beam.error.RuntimeValueProviderError: RuntimeValueProvider(option: table, type: str, default_value: 'my-project-id:some-dataset.some-table').get() not called from a runtime context

当我在创建模板时添加--table参数时，正在创建模板，但是--table参数值随后在模板中进行了硬编码，以后不会被任何给定的模板值覆盖，以供table使用.

When I add a --table parameter at template creation, the template is being created but the --table parameter value is then hardcoded in the template and not overridden by any given template value for table later.

将table=custom_options.table.get(),替换为table=StaticValueProvider(str, custom_options.table.get())时，我得到相同的错误.

I get the same error when I replaced the table=custom_options.table.get(), with table=StaticValueProvider(str, custom_options.table.get()).

是否有人已经使用可自定义的BigQuerySink参数构建了可模板化的数据流?我很想得到一些提示.

Is there someone who already built a templatable Dataflow with customisable BigQuerySink parameters? I'd love to get some hints on this.

数据流模板是否支持BigQuery接收器选项的模板输入? [英] Does Dataflow templating supports template input for BigQuery sink options?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

数据流模板是否支持BigQuery接收器选项的模板输入? [英] Does Dataflow templating supports template input for BigQuery sink options?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭