使用squaredDistance两点之间的Apache火花距离 [英] Apache Spark distance between two points using squaredDistance

查看:836
本文介绍了使用squaredDistance两点之间的Apache火花距离的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有向量的RDD colletions,其中每个矢量重新present与 A点x 坐标。例如,文件如下:

I have a RDD colletions of vectors, where each vector represent a point with x and y coordinates. For example, file is as follows:

1.1 1.2
6.1 4.8
0.1 0.1
9.0 9.0
9.1 9.1
0.4 2.1

我读它:

  def parseVector(line: String): Vector[Double] = {
    DenseVector(line.split(' ')).map(_.toDouble)
  }

  val lines = sc.textFile(inputFile)
  val points = lines.map(parseVector).cache()

另外,我有一个小量:

Also, I have an epsilon:

  val eps = 2.0

对于每一个点,我想找到它的邻居是小量距离内谁。我做的:

For each point I want to find its neighbors who are within the epsilon distance. I do:

points.foreach(point =>
  // squaredDistance(point, ?) what should I write here?
)

我怎样才能循环中的所有点,每个点找到它的邻居?可能使用地图功能?

推荐答案

您可以这样做:

val distanceBetweenPoints = points.cartesian(points)
    .filter{case (x,y) => (x!=y)} // remove the (x,x) diagonal
    .map{case (x,y) => ((x,y),distance(x,y))}
val pointsWithinEps = distanceBetweenPoints.filter{case ((x,y),distance) => distance <= eps)}

您也可以在过滤器中结合了计算距离,如果你不关心事后点之间的距离。

You could also combine the distance calculation within the filter if you don't care about the distance between the points afterwards.

这篇关于使用squaredDistance两点之间的Apache火花距离的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆