使用Dataflow Java代码连接嵌套结构表 [英] Join Nested Structure Table using Dataflow Java code
问题描述
我的目标是连接两个表,其中第二个表是普通表,第一个是嵌套结构表.连接键在第一个表的嵌套结构内可用.在这种情况下,如何使用数据流Java代码联接这两个表. WithKeys(org.apache.beam.sdk.transforms.WithKeys)接受直接的列名,并且不允许像firstTable.columnname
这样.有人可以帮助解决这个问题.
My objective is to join two tables, where the second table is normal and the first one is nested structure table. The join key is available inside the nested structure in first table. In this case, how to Join these two tables using dataflow java code. WithKeys (org.apache.beam.sdk.transforms.WithKeys) accepting direct column name and it does not allow like firstTable.columnname
. Could some one to help to solve this case.
推荐答案
如果两个表都同样大,请考虑使用此处.在执行此操作之前,您将必须将数据转换为由适当的键作为键的两个PCollection.
If both tables are equally large consider using the CoGroupByKey transform described here. You will have to transform your data into two PCollections keyed by the proper key before this operation.
如果一个表明显小于另一个表,则按照此处可能是一个更好的选择.
If one table is significantly smaller than the other, feeding the smaller PCollection as a side input to a ParDo over the larger PCollection as described here might be a better option.
这篇关于使用Dataflow Java代码连接嵌套结构表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!