显式的排序在斯卡拉星火笛卡尔转型 [英] Explicit sort in Cartesian transformation in Scala Spark

查看:176
本文介绍了显式的排序在斯卡拉星火笛卡尔转型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用星火斯卡拉笛卡尔转型。如果我的输入包括4个元素(可以是数字/字/元组​​)说。

  VAR myRDD = sc.parallelize(阵列(E1,E2,E3,E4))

myRDD.cartesian(myRDD)将产生对所有可能的组合,但不一定在顺序。什么是一个聪明的办法让那些对,以?即。

 阵列((E1,E1),(E1,E2),(E1,E3),(E1,E4),(E2,E1),(E2,E2), (E2,E3),(E2,E4),(E3,E1),(E3,E2),(E3,E3),(E3,E4),(E4,E1),(E4,E2),(E4 ,E3),(E4,E4))


解决方案

如果您需要的是能够识别每一个点(这样你就可以决定对点和 L2 的距离),因此,你真正需要是一个 ID 添加到 RDD 或每个条目数据帧

如果您想使用 RDD ,我推荐的做法是:

  myRDD = sc.parallelize([(0,(0.0,0.0)),(1,(2.0 0.0)),
                        (2,(-3.0,2.0)),(3,(-6.0,-4.0))])组合= myRDD.cartesian(myRDD).coalesce(32)距离=组合\\
    .filter(拉姆达(X,Y):X [0]&所述; Y [0])\\
    .MAP(拉姆达((ID1,(X1,Y1)),(ID2,(X2,Y2))):(ID1,ID2,((X1 - ×2)** 2+(Y1 - Y2)** 2) ** 0.5))distances.collect()

I am using Cartesian transformation in Spark Scala. If my input consists of 4 elements (could be numbers/characters/tuple) say

var myRDD=sc.parallelize(Array("e1","e2","e3","e4"))

myRDD.cartesian(myRDD) would yield all possible combination of pairs but not necessarily in order. What is a smart way to get those pairs in Order? i.e.

Array((e1,e1), (e1,e2), (e1,e3), (e1,e4), (e2,e1), (e2,e2), (e2,e3), (e2,e4), (e3,e1), (e3,e2), (e3,e3), (e3,e4), (e4,e1), (e4,e2), (e4,e3), (e4,e4))

解决方案

If what you need is to be able to identify each point (so you can determine the pair of points and their L2 distance), thus what you really require is to add an id to each entry in the RDD or DataFrame.

If you want to use an RDD, the approach I recommend is:

myRDD = sc.parallelize([(0, (0.0, 0.0)), (1, (2.0, 0.0)), 
                        (2, (-3.0, 2.0)), (3, (-6.0, -4.0))])

combinations = myRDD.cartesian(myRDD).coalesce(32)

distances = combinations\
    .filter(lambda (x, y): x[0] < y[0])\
    .map(lambda ((id1, (x1, y1)), (id2, (x2, y2))): (id1, id2, ((x1 - x2) ** 2 + (y1 - y2) ** 2) ** 0.5))

distances.collect()

这篇关于显式的排序在斯卡拉星火笛卡尔转型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆