在beam.DoFn中调用beam.io.WriteToBigQuery [英] Calling beam.io.WriteToBigQuery in a beam.DoFn

查看：99 发布时间：2021/4/7 20:57:17 python google-cloud-platform google-bigquery google-cloud-dataflow apache-beam

本文介绍了在beam.DoFn中调用beam.io.WriteToBigQuery的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经创建了带有一些参数的数据流模板.当我将数据写入BigQuery时，我想利用这些参数来确定它应该写入哪个表.我尝试按照以下链接中的建议在ParDo中调用WriteToBigQuery.

I've created a dataflow template with some parameters. When I write the data to BigQuery, I would like to make use of these parameters to determine which table it is supposed to write to. I've tried calling WriteToBigQuery in a ParDo as suggested in the following link.

如何使用Apache Beam中的运行时值提供程序写入Big Query?

管道成功运行，但未创建数据或将数据加载到BigQuery.知道可能是什么问题吗?

The pipeline ran successfully but it is not creating or loading data to BigQuery. Any idea what might be the issue?

def run():
  pipeline_options = PipelineOptions()
  pipeline_options.view_as(DebugOptions).experiments = ['use_beam_bq_sink']

  with beam.Pipeline(options=pipeline_options) as p:
    custom_options = pipeline_options.view_as(CustomOptions)

    _ = (
      p
      | beam.Create([None])
      | 'Year to periods' >> beam.ParDo(SplitYearToPeriod(custom_options.year))
      | 'Read plan data' >> beam.ParDo(GetPlanDataByPeriod(custom_options.secret_name))
      | 'Transform record' >> beam.Map(transform_record)
      | 'Write to BQ' >> beam.ParDo(WritePlanDataToBigQuery(custom_options.year))
    )

if __name__ == '__main__':
  run()

class CustomOptions(PipelineOptions):
  @classmethod
  def _add_argparse_args(cls, parser):
    parser.add_value_provider_argument('--year', type=int)
    parser.add_value_provider_argument('--secret_name', type=str)

class WritePlanDataToBigQuery(beam.DoFn):
  def __init__(self, year_vp):
    self._year_vp = year_vp

  def process(self, element):
    year = self._year_vp.get()

    table = f's4c.plan_data_{year}'
    schema = {
      'fields': [ ...some fields properties ]
    }

    beam.io.WriteToBigQuery(
      table=table,
      schema=schema,
      create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
      write_disposition=BigQueryDisposition.WRITE_TRUNCATE,
      method=beam.io.WriteToBigQuery.Method.FILE_LOADS
    )

在beam.DoFn中调用beam.io.WriteToBigQuery [英] Calling beam.io.WriteToBigQuery in a beam.DoFn

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在beam.DoFn中调用beam.io.WriteToBigQuery [英] Calling beam.io.WriteToBigQuery in a beam.DoFn

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭