有没有更好的方法来减少对RDD [Array [Double]]的操作 [英] Is there a better way for reduce operation on RDD[Array[Double]]

查看:142
本文介绍了有没有更好的方法来减少对RDD [Array [Double]]的操作的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想减少RDD [Array [Double]],以便将数组的每个元素与下一个数组的相同元素相加.我暂时使用此代码:

I want to reduce a RDD[Array[Double]] in order to each element of the array will be add with the same element of the next array. I use this code for the moment :

var rdd1 = RDD[Array[Double]]

var coord = rdd1.reduce( (x,y) => { (x, y).zipped.map(_+_) })

有一种更好的方法可以使这种方法更有效,因为它会造成损害.

Is there a better way to make this more efficiently because it cost a harm.

推荐答案

使用zipped.map效率很低,因为它会创建很多临时对象并将双精度对象装箱.

Using zipped.map is very inefficient, because it creates a lot of temporary objects and boxes the doubles.

如果您使用尖塔,则可以这样做

If you use spire, you can just do this

> import spire.implicits._
> val rdd1 = sc.parallelize(Seq(Array(1.0, 2.0), Array(3.0, 4.0)))
> var coord = rdd1.reduce( _ + _)
res1: Array[Double] = Array(4.0, 6.0)

这看起来好多了,而且应该很多更有效.

This is much nicer to look at, and should also be much more efficient.

Spire是spark的依赖项,因此您应该能够执行上述操作而没有任何额外的依赖项.至少它在这里使用了用于Spark 1.3.1的spark-shell.

Spire is a dependency of spark, so you should be able to do the above without any extra dependencies. At least it worked with a spark-shell for spark 1.3.1 here.

这将适用于任何具有元素类型的AdditiveSemigroup typeclass实例的数组.在这种情况下,元素类型为Double.尖顶类型类专门用于双精度类型,因此不会在任何地方进行拳击.

This will work for any array where there is an AdditiveSemigroup typeclass instance available for the element type. In this case, the element type is Double. Spire typeclasses are @specialized for double, so there will be no boxing going on anywhere.

如果您真的想知道要做什么,那么必须使用reify:

If you really want to know what is going on to make this work, you have to use reify:

> import scala.reflect.runtime.{universe => u}
> val a = Array(1.0, 2.0)
> val b = Array(3.0, 4.0)
> u.reify { a + b }

res5: reflect.runtime.universe.Expr[Array[Double]] = Expr[scala.Array[Double]](
  implicits.additiveSemigroupOps(a)(
    implicits.ArrayNormedVectorSpace(
      implicits.DoubleAlgebra, 
      implicits.DoubleAlgebra,
      Predef.this.implicitly)).$plus(b))

所以加法有效,因为存在Array [Double]的AdditiveSemigroup实例.

So the addition works because there is an instance of AdditiveSemigroup for Array[Double].

这篇关于有没有更好的方法来减少对RDD [Array [Double]]的操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆