Google Cloud 数据流:从具有动态文件名的文件中读取 [英] Google Cloud dataflow : Read from a file with dynamic filename

查看：38 发布时间：2021/11/11 22:34:23 google-cloud-platform google-cloud-storage google-cloud-dataflow google-cloud-pubsub apache-beam

本文介绍了Google Cloud 数据流:从具有动态文件名的文件中读取的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在 Google Cloud Dataflow 上构建一个可以执行以下操作的管道:

I am trying to build a pipeline on Google Cloud Dataflow that would do the following:

收听 Pubsub 订阅上的事件
从事件文本中提取文件名
读取文件(来自 Google Cloud Storage 存储分区)
将记录存储在 BigQuery 中

代码如下:

Pipeline pipeline = //create pipeline
pipeline.apply("read events", PubsubIO.readStrings().fromSubscription("sub"))
        .apply("Deserialise events", //Code that produces ParDo.SingleOutput<String, KV<String, byte[]>>)
        .apply(TextIO.read().from(""))???

我在第三步中挣扎，不太确定如何访问第二步的输出并在第三步中使用它.我曾尝试编写产生以下内容的代码:

I am struggling with 3rd step, not quite sure how to access the output of second step and use it in 3rd. I have tried writing the code that produces the following:

private ParDo.SingleOutput<KV<String, byte[]>, TextIO.Read> readFile(){
    //A class that extends DoFn<KV<String, byte[]>, TextIO.Read> and has TextIO.read wrapped into processElement method
}

但是，我无法在后续步骤中读取文件内容.

However, I am not able to read the file content in subsequent step.

谁能告诉我我需要在第 3 步和第 4 步中写什么，以便我可以逐行使用文件并将输出存储到 BigQuery(或只是记录它).

Could anyone please me know what do I need to write in 3rd and 4th steps so that I can consume the file line by line and store the output to BigQuery (or just log it).

Google Cloud 数据流:从具有动态文件名的文件中读取 [英] Google Cloud dataflow : Read from a file with dynamic filename

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Google Cloud 数据流:从具有动态文件名的文件中读取 [英] Google Cloud dataflow : Read from a file with dynamic filename

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭