火花:从RDD产生RDD [(X,X)]的所有可能组合的[X] [英] Spark: produce RDD[(X, X)] of all possible combinations from RDD[X]
问题描述
时的火花有可能从Scala集合贯彻.combinations功能?
/ **了迭代组合。
*
返回:横断这个$科尔可能的n元素组合的迭代器。
*`@exampleabbbc.combinations(2)=迭代器(AB,AC,BB,BC)`
* /
例如如何从RDD [X]以RDD [列表[X]或RDD [(X,X)]的大小= 2的组合,让我们假设在RDD所有值都是唯一的。<得到/ p>
笛卡尔积和组合是两个不同的东西,笛卡尔乘积会造成大小的RDD rdd.size()^ 2
和组合将创建一个大小 rdd.size的RDD()选择2
VAL RDD = sc.parallelize(1〜5)
VAL组合= rdd.cartesian(RDD).filter {案(A,B)=&GT; A&LT; B}`。
combinations.collect()
请注意这一点,如果一个排序的列表中的元素定义,只会工作,因为我们使用&LT;
。这其中仅适用于选择两个,而是可以很容易地通过确保关系 A&LT扩展; b
对所有a和b序列在
Is it possible in Spark to implement '.combinations' function from scala collections?
/** Iterates over combinations.
*
* @return An Iterator which traverses the possible n-element combinations of this $coll.
* @example `"abbbc".combinations(2) = Iterator(ab, ac, bb, bc)`
*/
For example how can I get from RDD[X] to RDD[List[X]] or RDD[(X,X)] for combinations of size = 2. And lets assume that all values in RDD are unique.
Cartesian product and combinations are two different things, the cartesian product will create an RDD of size rdd.size() ^ 2
and combinations will create an RDD of size rdd.size() choose 2
val rdd = sc.parallelize(1 to 5)
val combinations = rdd.cartesian(rdd).filter{ case (a,b) => a < b }`.
combinations.collect()
Note this will only work if an ordering is defined on the elements of the list, since we use <
. This one only works for choosing two but can easily be extended by making sure the relationship a < b
for all a and b in the sequence
这篇关于火花:从RDD产生RDD [(X,X)]的所有可能组合的[X]的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!