Beam/Dataflow 自定义 Python 作业 - Cloud Storage 到 PubSub [英] Beam / Dataflow Custom Python job - Cloud Storage to PubSub

查看：34 发布时间：2021/11/11 22:33:53 python google-cloud-storage apache-beam google-cloud-pubsub dataflow

本文介绍了Beam/Dataflow 自定义 Python 作业 - Cloud Storage 到 PubSub的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要对某些数据执行非常简单的转换(从 JSON 中提取字符串)，然后将其写入 PubSub - 我正在尝试使用自定义 Python Dataflow 作业来执行此操作.

I need to perform a very simple transformation on some data (extract a string from JSON), then write it to PubSub - I'm attempting to use a custom python Dataflow job to do so.

我编写了一个成功写回 Cloud Storage 的作业，但即使是最简单的写入 PubSub(无转换)的尝试也会导致错误:JOB_MESSAGE_ERROR:工作流失败.原因:预期自定义源具有非零分割数.

I've written a job which successfully writes back to Cloud Storage, but my attempts at even the simplest possible write to PubSub (no transformation) result in an error: JOB_MESSAGE_ERROR: Workflow failed. Causes: Expected custom source to have non-zero number of splits.

有没有人通过 Dataflow 从 GCS 成功写入 PubSub?

Has anyone successfully written to PubSub from GCS via Dataflow?

谁能解释一下这里出了什么问题?

Can anyone shed some light on what is going wrong here?


def run(argv=None):

  parser = argparse.ArgumentParser()
  parser.add_argument('--input',
                      dest='input',
                      help='Input file to process.')
  parser.add_argument('--output',
                      dest='output',                      
                      help='Output file to write results to.')
  known_args, pipeline_args = parser.parse_known_args(argv)

  pipeline_options = PipelineOptions(pipeline_args)
  pipeline_options.view_as(SetupOptions).save_main_session = True
  with beam.Pipeline(options=pipeline_options) as p:

    lines = p | ReadFromText(known_args.input)

    output = lines #Obviously not necessary but this is where my simple extract goes

    output | beam.io.WriteToPubSub(known_args.output) # This doesn't

Beam/Dataflow 自定义 Python 作业 - Cloud Storage 到 PubSub [英] Beam / Dataflow Custom Python job - Cloud Storage to PubSub

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

Beam/Dataflow 自定义 Python 作业 - Cloud Storage 到 PubSub [英] Beam / Dataflow Custom Python job - Cloud Storage to PubSub

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭