有没有更好的方法来减少对RDD [Array [Double]]的操作 [英] Is there a better way for reduce operation on RDD[Array[Double]]
问题描述
我想减少RDD [Array [Double]],以便将数组的每个元素与下一个数组的相同元素相加.我暂时使用此代码:
I want to reduce a RDD[Array[Double]] in order to each element of the array will be add with the same element of the next array. I use this code for the moment :
var rdd1 = RDD[Array[Double]]
var coord = rdd1.reduce( (x,y) => { (x, y).zipped.map(_+_) })
有一种更好的方法可以使这种方法更有效,因为它会造成损害.
Is there a better way to make this more efficiently because it cost a harm.
推荐答案
使用zipped.map效率很低,因为它会创建很多临时对象并将双精度对象装箱.
Using zipped.map is very inefficient, because it creates a lot of temporary objects and boxes the doubles.
如果您使用尖塔,则可以这样做
If you use spire, you can just do this
> import spire.implicits._
> val rdd1 = sc.parallelize(Seq(Array(1.0, 2.0), Array(3.0, 4.0)))
> var coord = rdd1.reduce( _ + _)
res1: Array[Double] = Array(4.0, 6.0)
这看起来好多了,而且应该很多更有效.
This is much nicer to look at, and should also be much more efficient.
Spire是spark的依赖项,因此您应该能够执行上述操作而没有任何额外的依赖项.至少它在这里使用了用于Spark 1.3.1的spark-shell.
Spire is a dependency of spark, so you should be able to do the above without any extra dependencies. At least it worked with a spark-shell for spark 1.3.1 here.
这将适用于任何具有元素类型的AdditiveSemigroup typeclass实例的数组.在这种情况下,元素类型为Double.尖顶类型类专门用于双精度类型,因此不会在任何地方进行拳击.
This will work for any array where there is an AdditiveSemigroup typeclass instance available for the element type. In this case, the element type is Double. Spire typeclasses are @specialized for double, so there will be no boxing going on anywhere.
如果您真的想知道要做什么,那么必须使用reify:
If you really want to know what is going on to make this work, you have to use reify:
> import scala.reflect.runtime.{universe => u}
> val a = Array(1.0, 2.0)
> val b = Array(3.0, 4.0)
> u.reify { a + b }
res5: reflect.runtime.universe.Expr[Array[Double]] = Expr[scala.Array[Double]](
implicits.additiveSemigroupOps(a)(
implicits.ArrayNormedVectorSpace(
implicits.DoubleAlgebra,
implicits.DoubleAlgebra,
Predef.this.implicitly)).$plus(b))
所以加法有效,因为存在Array [Double]的AdditiveSemigroup实例.
So the addition works because there is an instance of AdditiveSemigroup for Array[Double].
这篇关于有没有更好的方法来减少对RDD [Array [Double]]的操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!