在 beam.DoFn 中调用 beam.io.WriteToBigQuery [英] Calling beam.io.WriteToBigQuery in a beam.DoFn

查看：28 发布时间：2021/11/11 22:41:40 python google-cloud-platform google-bigquery google-cloud-dataflow apache-beam

本文介绍了在 beam.DoFn 中调用 beam.io.WriteToBigQuery的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我创建了一个带有一些参数的数据流模板.当我将数据写入 BigQuery 时，我想利用这些参数来确定它应该写入哪个表.我已尝试按照以下链接中的建议在 ParDo 中调用 WriteToBigQuery.

I've created a dataflow template with some parameters. When I write the data to BigQuery, I would like to make use of these parameters to determine which table it is supposed to write to. I've tried calling WriteToBigQuery in a ParDo as suggested in the following link.

如何使用 Apache Beam 中的运行时值提供程序写入 Big Query?

管道成功运行，但未创建或加载数据到 BigQuery.知道可能是什么问题吗?

The pipeline ran successfully but it is not creating or loading data to BigQuery. Any idea what might be the issue?

def run():
  pipeline_options = PipelineOptions()
  pipeline_options.view_as(DebugOptions).experiments = ['use_beam_bq_sink']

  with beam.Pipeline(options=pipeline_options) as p:
    custom_options = pipeline_options.view_as(CustomOptions)

    _ = (
      p
      | beam.Create([None])
      | 'Year to periods' >> beam.ParDo(SplitYearToPeriod(custom_options.year))
      | 'Read plan data' >> beam.ParDo(GetPlanDataByPeriod(custom_options.secret_name))
      | 'Transform record' >> beam.Map(transform_record)
      | 'Write to BQ' >> beam.ParDo(WritePlanDataToBigQuery(custom_options.year))
    )

if __name__ == '__main__':
  run()

class CustomOptions(PipelineOptions):
  @classmethod
  def _add_argparse_args(cls, parser):
    parser.add_value_provider_argument('--year', type=int)
    parser.add_value_provider_argument('--secret_name', type=str)

class WritePlanDataToBigQuery(beam.DoFn):
  def __init__(self, year_vp):
    self._year_vp = year_vp

  def process(self, element):
    year = self._year_vp.get()

    table = f's4c.plan_data_{year}'
    schema = {
      'fields': [ ...some fields properties ]
    }

    beam.io.WriteToBigQuery(
      table=table,
      schema=schema,
      create_disposition=BigQueryDisposition.CREATE_IF_NEEDED,
      write_disposition=BigQueryDisposition.WRITE_TRUNCATE,
      method=beam.io.WriteToBigQuery.Method.FILE_LOADS
    )

在 beam.DoFn 中调用 beam.io.WriteToBigQuery [英] Calling beam.io.WriteToBigQuery in a beam.DoFn

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在 beam.DoFn 中调用 beam.io.WriteToBigQuery [英] Calling beam.io.WriteToBigQuery in a beam.DoFn

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭