两个RDD [mllib.linalg.Vector]的加法 [英] Addition of two RDD[mllib.linalg.Vector]'s
问题描述
我需要另外存储在两个文件中两个矩阵的。
I need addition of two matrices that are stored in two files.
含量 latest1.txt
和 latest2.txt
的下一个STR:
The content of latest1.txt
and latest2.txt
has the next str:
1 2 3
4 5 6
7 8 9
我读这些文件如下:
I am reading those files as follows:
scala> val rows = sc.textFile("latest1.txt").map { line => val values = line.split(‘ ‘).map(_.toDouble)
Vectors.sparse(values.length,values.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))
}
scala> val r1 = rows
r1: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MappedRDD[2] at map at :14
scala> val rows = sc.textFile("latest2.txt").map { line => val values = line.split(‘ ‘).map(_.toDouble)
Vectors.sparse(values.length,values.zipWithIndex.map(e => (e._2, e._1)).filter(_._2 != 0.0))
}
scala> val r2 = rows
r2: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] = MappedRDD[2] at map at :14
我想补充R1,R2。那么,有没有什么办法可以添加此两个 RDD [mllib.linalg.Vector]
S IN Apache的星火。
I want to add r1, r2. So, Is there any way to add this two RDD[mllib.linalg.Vector]
s in Apache-Spark.
推荐答案
这其实是一个很好的问题。我经常mllib工作,并没有意识到这些基本线性代数运算不是很方便。
This is actually a good question. I work with mllib regularly and did not realize these basic linear algebra operations are not easily accessible.
问题的关键在于,底层的微风矢量拥有所有的线性代数操作你所期望的 - 当然也包括基本要素明智的,而且你特别提到的
The point is that the underlying breeze vectors have all of the linear algebra manipulations you would expect - including of course basic element wise addition that you specifically mentioned.
不过微风实现从外界通过隐藏的:
However the breeze implementation is hidden from the outside world via:
[private mllib]
那么,从外面的世界/公共API的角度来看,我们应该如何访问这些原语?
So then, from the outside world/public API perspective, how do we access those primitives?
他们中的一些已经暴露:如平方和:
Some of them are already exposed: e.g. sum of squares:
/**
* Returns the squared distance between two Vectors.
* @param v1 first Vector.
* @param v2 second Vector.
* @return squared distance between two Vectors.
*/
def sqdist(v1: Vector, v2: Vector): Double = {
...
}
然而这些现有方法的选择是有限的 - 并且实际上确实的不的包括基本操作,包括元素方式加,减,乘,等等
However the selection of such available methods is limited - and in fact does not include the basic operations including element wise addition, subtraction, multiplication, etc.
因此,这里是我看到的最好的:
So here is the best I could see:
- 转换矢量清风:
- 在微风中执行向量运算
- 转换微风回mllib矢量
下面是一些示例code:
Here is some sample code:
val v1 = Vectors.dense(1.0, 2.0, 3.0)
val v2 = Vectors.dense(4.0, 5.0, 6.0)
val bv1 = new DenseVector(v1.toArray)
val bv2 = new DenseVector(v2.toArray)
val vectout = Vectors.dense((bv1 + bv2).toArray)
vectout: org.apache.spark.mllib.linalg.Vector = [5.0,7.0,9.0]
这篇关于两个RDD [mllib.linalg.Vector]的加法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!