使用每个元素的Apache Beam流式写入gcs [英] streaming write to gcs using apache beam per element

查看：101 发布时间：2021/4/7 20:57:54 google-cloud-platform google-cloud-storage google-cloud-dataflow apache-beam apache-beam-io

本文介绍了使用每个元素的Apache Beam流式写入gcs的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当前束流管道正在使用 FileIO.matchAll().continuously()将文件读取为流.这将返回PCollection.我想将这些文件以相同的名称写回到另一个gcs存储桶中，即每个 PCollection 是一个文件 metadata/可读File ，应在进行一些处理后将其写回到另一个存储桶中.我是否应该使用任何接收器来实现将每个 PCollection 项写回到GCS，或者有任何方法可以做到这一点?是否可以为每个元素创建一个窗口，然后使用一些GCS接收器IO来执行此操作.在窗口上操作(即使它具有多个元素)时，beam是否保证窗口已完全处理或根本未处理，换句话说就是对给定的 GCS或bigquery 的写操作窗口是原子的，如果发生任何故障，不是局部的?

Current beam pipeline is reading files as stream using FileIO.matchAll().continuously(). This returns PCollection . I want to write these files back with the same names to another gcs bucket i.e each PCollection is one file metadata/readableFile which should be written back to another bucket after some processing. Is there any sink that i should use to achieve writing each PCollection item back to GCS or are there any ways to do it ? Is it possible to create a window per element and then use some GCS sink IO to be able to do this. When operating on a window (even if it has multiple elements) , does beam guarantees that either a window is fully processed or not processed at all , in other words are write operations to GCS or bigquery for a given window atomic and not partial in case of any failures ?

使用每个元素的Apache Beam流式写入gcs [英] streaming write to gcs using apache beam per element

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用每个元素的Apache Beam流式写入gcs [英] streaming write to gcs using apache beam per element

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭