星火 - 加入2 PairRDD元素 [英] Spark - Joining 2 PairRDD elements

查看:163
本文介绍了星火 - 加入2 PairRDD元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

喜有JavaRDDPair有2个元素:

Hi have a JavaRDDPair with 2 elements:

("TypeA", List<jsonTypeA>),

("TypeB", List<jsonTypeB>)

我需要在2对组合成1对类型:

I need to combine the 2 pairs into 1 pair of type:

("TypeA_B", List<jsonCombinedAPlusB>)

我需要2个列表组合成1列表,其中每个2 jsons(A型1和B型1)有一些共同的领域我可以加入上。

I need to combine the 2 lists into 1 list, where each 2 jsons (1 of type A and 1 of type B) have some common field I can join on.

考虑类型A的该列表比其他显著小,并且加入应内,所以结果列表应该是类型A的列表中的小。

Consider that list of type A is significantly smaller than the other, and the join should be inner, so the result list should be as small as the list of type A.

什么是最有效的方式做到这一点?

What is the most efficient way to do that?

推荐答案

rdd.join(otherRdd)为您提供内第一个RDD加入。要使用它,你将需要两个RDDS转换到具有的关键,你将加入对普通属性的PairRDD。
像这样的东西(例如,未经测试):

rdd.join(otherRdd) provides you inner join on the first rdd. To use it, you will need to transform both RDDs to a PairRDD that has as key the common attribute that you will be joining on. Something like this (example, untested):

val rddAKeyed = rddA.keyBy{case (k,v) => key(v)}
val rddBKeyed = rddB.keyBy{case (k,v) => key(v)}

val joined = rddAKeyed.join(rddBKeyed).map{case (k,(json1,json2)) => (newK, merge(json1,json2))}

其中,合并(J1,J2)是如何加入两个JSON对象的具体的业务逻辑。

Where merge(j1,j2) is the specific business logic on how to join the two json objects.

这篇关于星火 - 加入2 PairRDD元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆