从Google Cloud Storage流数据流到Big Query [英] Streaming dataflow from Google Cloud Storage to Big Query

本文介绍了从Google Cloud Storage流数据流到Big Query的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用DataFlow(Java)将数据从Cloud Storage插入到Big Query.我可以批量上传数据;但是,我想设置一个流式上传.因此,随着将新对象添加到我的存储桶中,它们将被推送到BigQuery.

I am trying to insert data from Cloud Storage to Big Query using DataFlow (Java). I can Batch upload the data; however, I want to set up a streaming upload instead. So as new objects are added to my bucket, they will get pushed to BigQuery.

我已将PipelineOptions设置为Streaming,它在GCP控制台UI中显示了数据流管道是流式的.存储桶中我最初的文件/对象集被推送到BigQuery.

I have set up the PipelineOptions to be Streaming and it shows in the GCP Console UI that the dataflow pipeline is of streaming type. My initial set of files/objects in the bucket get pushed to BigQuery.

但是当我向存储桶中添加新对象时,这些对象不会被推送到BigQuery.这是为什么?如何使用蒸腾的数据流管道将添加到Cloud Storage中的对象推送到BigQuery?

But as I add new objects to my bucket, these do not get pushed to BigQuery. Why is that? How can I push objects that are added to my Cloud Storage to BigQuery using a steaming dataflow pipeline?

//Specify PipelineOptions
DataflowPipelineOptions options = PipelineOptionsFactory.as(DataflowPipelineOptions.class);


  options.setProject(<project-name>);
  options.setStagingLocation(<bucket/staging folder>);    
  options.setStreaming(true);
  options.setRunner(DataflowRunner.class);

我的解释是,由于这是一条流传输管道,因此当我将对象添加到Cloud Storage时,它们将被推送到BigQuery.

My interpretation is that because this is a streaming pipeline, as I add objects to Cloud Storage, they will get pushed to BigQuery.

请提出建议.

推荐答案

如何创建输入集合?您需要无限制的输入,流传输管道才能继续运行,否则它将只是临时的(但将使用流传输插入). 您可以通过读取包含您所有存储段中所有更改的订阅来实现此目的,请参见 https ://cloud.google.com/storage/docs/pubsub-notifications 了解详情.

How do you create your input collection? You need to have an unbounded input for the streaming pipeline to stay on, otherwise it will only be temporary (but will use streaming inserts). You could achieve this by reading from a subscription which has all the changes in your bucket, see https://cloud.google.com/storage/docs/pubsub-notifications for details.

这篇关于从Google Cloud Storage流数据流到Big Query的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆