Google Dataflow - 将数据保存到多个BigQuery表格中 [英] Google Dataflow - save the data into multiple BigQuery tables

查看:92
本文介绍了Google Dataflow - 将数据保存到多个BigQuery表格中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Google Dataflow 1.9将数据保存到BigQuery表格中。
我正在寻找一种方法来控制基于该元素中的某个值写入(PCollection)元素的表名称。在我们的例子中,这些元素包含一个user-id,我们希望动态地将它们写入它自己的用户表中。 解析方案

对于1.9.0,唯一的选择是(1)将元素分割成多个输出集合,然后将每个输出集合写入特定的表或(2)窗口元素并根据窗口选择目标。选项1只适用于目标表格较小的情况,而选项2只有在决定基于窗口的情况下才能起作用,这将不适合每个用户目的地的使用情况。


$ b

如果升级到2.0.0,则可以使用 DynamicDestinations SerializableFunction 。这将允许您接收每个元素,然后根据用户标识选择目的地。


I’m using Google Dataflow 1.9 to save data into BigQuery tables. I'm looking for a way to control the table name into which a (PCollection) element is written, based on some value in that element. In our case, the elements contain a user-id, and we wish to write each to it's own user table, dynamically.

解决方案

With 1.9.0 the only options are to either (1) partition the elements into multiple output collections, and then write each output collection to a specific table or (2) window the elements and select the destination based on the window. Option 1 will only work if there is a relatively small set of destination tables and option 2 will only work if the decision is based on the window, which won't fit your use case of per-user destinations very

If you upgrade to 2.0.0 the destination may be specified by a function that receives the window and data element, using either DynamicDestinations or a SerializableFunction. This would allow you to receive each element and then choose the destination based on the user ID.

这篇关于Google Dataflow - 将数据保存到多个BigQuery表格中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆