根据行内容指定动态生成的表名 [英] Specifying dynamically generated table name based on line contents

查看:145
本文介绍了根据行内容指定动态生成的表名的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想设置从gcs桶中的文件中读取数据流管道,然后写入bigquery表。注意,应该根据从gcs文件中读取的行的内容来决定要写入的表。



我的问题是,这可能吗?如果是的话,有人可以给我任何提示,告诉我如何做到这一点吗?

另外,阅读必须完成的gcs文件是动态的。我使用的是对象更改通知服务,无论何时将任何文件添加/删除到存储桶中,都会调用我的appengine注册端点,并添加/删除文件详细信息。这是其内容必须流式传输到bigquery的文件。

是否可以将数据流管道与appengine集成?



最后,这整个设置甚至是最好的方式吗?



谢谢...

解决方案

关于你的第一个问题:请参阅在Apache Beam中为不同的BigQuery表格写入不同的值 在你的第二个问题上:一种方法是让你的appengine应用程序将每个变更通知发布到Cloud Pubsub,并且有一个持续运行的流式Dataflow管道,用于监视pubsub主题并写入BigQuery。

第三个问题:是的,假设你在GCS上的数据表示是固定的,剩下的似乎是一个合理的摄取架构给我:)

I would like to setup Dataflow pipeline that reads from file in a gcs bucket, and writes to bigquery table. Caveat being, table to write to should be decided based on content of the line being read from gcs file.

My question is, is this possible? If yes, can someone give me any hints as to how to accomplish this?

Also, the gcs files from where reading has to be done is dynamic. I'm using Object Change Notification Service that calls my appengine's registered endpoint whenever any file is added/removed to the bucket, alongwith added/removed file details. This is the file whose contents has to be streamed to bigquery.

Is it possible to integrate dataflow pipeline with appengine?

Lastly, is this whole setup even the best way to do?

Thanks...

解决方案

On your first question: see Writing different values to different BigQuery tables in Apache Beam

On your second question: one way to accomplish that would be to have your appengine app publish every change notification to Cloud Pubsub, and have a constantly running streaming Dataflow pipeline watching the pubsub topic and writing to BigQuery.

On your third question: yes, assuming your data representation on GCS is fixed, the rest seems like a reasonable ingestion architecture to me :)

这篇关于根据行内容指定动态生成的表名的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆