使用 Dataflow Java 代码连接嵌套结构表 [英] Join Nested Structure Table using Dataflow Java code

查看:23
本文介绍了使用 Dataflow Java 代码连接嵌套结构表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的目标是连接两个表,其中第二个表是普通表,第一个是嵌套结构表.连接键在第一个表的嵌套结构内可用.在这种情况下,如何使用数据流java代码连接这两个表.WithKeys (org.apache.beam.sdk.transforms.WithKeys) 接受直接列名,它不允许像 firstTable.columnname.有人可以帮助解决这个案子.

My objective is to join two tables, where the second table is normal and the first one is nested structure table. The join key is available inside the nested structure in first table. In this case, how to Join these two tables using dataflow java code. WithKeys (org.apache.beam.sdk.transforms.WithKeys) accepting direct column name and it does not allow like firstTable.columnname. Could some one to help to solve this case.

推荐答案

如果两个表都一样大,请考虑使用描述的 CoGroupByKey 转换 此处.在此操作之前,您必须将您的数据转换为两个以正确键为键的 PCollections.

If both tables are equally large consider using the CoGroupByKey transform described here. You will have to transform your data into two PCollections keyed by the proper key before this operation.

如果一个表明显小于另一个,则将较小的 PCollection 作为侧输入提供给 ParDo,而不是较大的 PCollection,如所述这里 可能是更好的选择.

If one table is significantly smaller than the other, feeding the smaller PCollection as a side input to a ParDo over the larger PCollection as described here might be a better option.

这篇关于使用 Dataflow Java 代码连接嵌套结构表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆