两个RDD [mllib.linalg.Vector]的加法 [英] Addition of two RDD[mllib.linalg.Vector]'s

查看:679
本文介绍了两个RDD [mllib.linalg.Vector]的加法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要另外存储在两个文件中两个矩阵的。

I need addition of two matrices that are stored in two files.

含量 latest1.txt latest2.txt 的下一个STR:

The content of latest1.txt and latest2.txt has the next str:


1 2 3
4 5 6
7 8 9

我读这些文件如下:

I am reading those files as follows:

scala> val rows = sc.textFile("latest1.txt").map { line => val values = line.split(‘ ‘).map(_.toDouble)
    Vectors.sparse(values.length,values.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))
}

scala> val r1 = rows
r1: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MappedRDD[2] at map at :14

scala> val rows = sc.textFile("latest2.txt").map { line => val values = line.split(‘ ‘).map(_.toDouble)
    Vectors.sparse(values.length,values.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))
}

scala> val r2 = rows
r2: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MappedRDD[2] at map at :14

我想补充R1,R2。那么,有没有什么办法可以添加此两个 RDD [mllib.linalg.Vector] S IN Apache的星火。

I want to add r1, r2. So, Is there any way to add this two RDD[mllib.linalg.Vector]s in Apache-Spark.

推荐答案

这其实是一个很好的问题。我经常mllib工作,并没有意识到这些基本线性代数运算不是很方便。

This is actually a good question. I work with mllib regularly and did not realize these basic linear algebra operations are not easily accessible.

问题的关键在于,底层的微风矢量拥有所有的线性代数操作你所期望的 - 当然也包括基本要素明智的,而且你特别提到的

The point is that the underlying breeze vectors have all of the linear algebra manipulations you would expect - including of course basic element wise addition that you specifically mentioned.

不过微风实现从外界通过隐藏的:

However the breeze implementation is hidden from the outside world via:

[private mllib]

那么,从外面的世界/公共API的角度来看,我们应该如何访问这些原语?

So then, from the outside world/public API perspective, how do we access those primitives?

他们中的一些已经暴露:如平方和:

Some of them are already exposed: e.g. sum of squares:

/**
 * Returns the squared distance between two Vectors.
 * @param v1 first Vector.
 * @param v2 second Vector.
 * @return squared distance between two Vectors.
 */
def sqdist(v1: Vector, v2: Vector): Double = { 
  ...
}

然而这些现有方法的选择是有限的 - 并且实际上确实的的包括基本操作,包括元素方式加,减,乘,等等

However the selection of such available methods is limited - and in fact does not include the basic operations including element wise addition, subtraction, multiplication, etc.

因此​​,这里是我看到的最好的:

So here is the best I could see:


  • 转换矢量清风:

  • 在微风中执行向量运算

  • 转换微风回mllib矢量

下面是一些示例code:

Here is some sample code:

val v1 = Vectors.dense(1.0, 2.0, 3.0)
val v2 = Vectors.dense(4.0, 5.0, 6.0)
val bv1 = new DenseVector(v1.toArray)
val bv2 = new DenseVector(v2.toArray)

val vectout = Vectors.dense((bv1 + bv2).toArray)
vectout: org.apache.spark.mllib.linalg.Vector = [5.0,7.0,9.0]

这篇关于两个RDD [mllib.linalg.Vector]的加法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆