交叉结合使用pyspark 2 RDDS [英] cross combine two RDDs using pyspark

查看:519
本文介绍了交叉结合使用pyspark 2 RDDS的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我怎样才能跨越结合起来(这是描述正确的方法是什么?)两个RDDS?

How can I cross combine (is this the correct way to describe?) the two RDDS?

输入:

rdd1 = [a, b]
rdd2 = [c, d]

输出:

rdd3 = [(a, c), (a, d), (b, c), (b, d)]

我试过 rdd3 = rdd1.flatMap(波长X:rdd2.map(拉姆达Y:(X,Y)),它抱怨看来,您正在尝试播放的RDD或一个动作或转换引​​用一个RDD。。我想这意味着你不能嵌套动作作为列表COM prehension,一个语句只能做一动作

I tried rdd3 = rdd1.flatMap(lambda x: rdd2.map(lambda y: (x, y)), it complains that It appears that you are attempting to broadcast an RDD or reference an RDD from an action or transformation.. I guess that means you can not nest action as in the list comprehension, and one statement can only do one action.

推荐答案

所以,当你已经注意到了,你不能执行转化在另一个转化(注意, flatMap &放大器; 地图转换,而不是动作,因为它们返回RDDS)。值得庆幸的是,你要完成的任务是直接由星火API在另一个转型的支持 - 即笛卡尔(见的 http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD )。

So as you have noticed you can't perform a transformation inside another transformation (note that flatMap & map are transformations rather than actions since they return RDDs). Thankfully, what your trying to accomplish is directly supported by another transformation in the Spark API - namely cartesian (see http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD ).

所以,你会想要做的 rdd1.cartesian(RDD2)

这篇关于交叉结合使用pyspark 2 RDDS的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆