可以使用reduceBykey更改类型并组合值-Scala Spark吗? [英] Can reduceBykey be used to change type and combine values - Scala Spark?

查看:264
本文介绍了可以使用reduceBykey更改类型并组合值-Scala Spark吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在下面的代码中,我试图合并值:

In code below I'm attempting to combine values:

val rdd: org.apache.spark.rdd.RDD[((String), Double)] =
    sc.parallelize(List(
      (("a"), 1.0),
      (("a"), 3.0),
      (("a"), 2.0)
      ))

val reduceByKey = rdd.reduceByKey((a , b) => String.valueOf(a) + String.valueOf(b))

reduceByValue应该包含(a,1,3,2),但是会收到编译时错误:

reduceByValue should contain (a , 1,3,2) but receive compile time error :

Multiple markers at this line - type mismatch; found : String required: Double - type mismatch; found : String 
 required: Double

什么决定缩减函数的类型?类型不能转换吗?

What determines the type of the reduce function? Can the type not be converted?

我可以使用groupByKey达到相同的结果,但只想了解reduceByKey.

I could use groupByKey to achieve same result but just want to understand reduceByKey.

推荐答案

否,给定类型为RDD[(K,V)]的rdd,reduceByKey将具有类型为(V,V) => V的关联函数.

No, given an rdd of type RDD[(K,V)], reduceByKey will take an associative function of type (V,V) => V.

如果我们想应用将值的类型更改为另一种任意类型的简化,则可以使用aggregateByKey:

If we want to apply a reduction that changes the type of the values to another arbitrary type, then we can use aggregateByKey:

def aggregateByKey[U](zeroValue: U)(seqOp: (U, V) ⇒ U, combOp: (U, U) ⇒ U)

使用zeroValueseqOp函数,它在地图侧提供类似于折叠的操作,而关联函数combOpseqOp的结果组合为最终结果,就像reduceByKey那样做. 从签名中可以看出,虽然集合值的类型为V,但aggregateByKey的结果将为任意类型U

Using the zeroValue and the seqOp function, it provides a fold-like operation at the map side while the associate function combOp combines the results of the seqOp to the final result, much like reduceByKey would do. As we can appreciate from the signature, while the collection values are of type V the result of aggregateByKey will be of an arbitrary type U

在上面的示例中,aggregateByKey看起来像这样:

Applied to the example above, aggregateByKey would look like this:

rdd.aggregateByKey("")({case (aggr , value) => aggr + String.valueOf(value)}, (aggr1, aggr2) => aggr1 + aggr2)

这篇关于可以使用reduceBykey更改类型并组合值-Scala Spark吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆