Spark:使用Scala在reduceByKey中使用平均值而不是sumByKey [英] Spark : Average of values instead of sum in reduceByKey using Scala

查看：203 发布时间：2020/9/4 5:56:57 scala apache-spark

本文介绍了Spark:使用Scala在reduceByKey中使用平均值而不是sumByKey的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当调用reduceByKey时，它将所有具有相同键的值相加.有什么方法可以计算每个键的平均值吗?

When reduceByKey is called it sums all values with same key. Is there any way to calculate the average of values for each key ?

// I calculate the sum like this and don't know how to calculate the avg
reduceByKey((x,y)=>(x+y)).collect


Array(((Type1,1),4.0), ((Type1,1),9.2), ((Type1,2),8), ((Type1,2),4.5), ((Type1,3),3.5), 
((Type1,3),5.0), ((Type2,1),4.6), ((Type2,1),4), ((Type2,1),10), ((Type2,1),4.3))

推荐答案

一种方法是使用mapValues和reduceByKey，这比gregationByKey容易.

One way is to use mapValues and reduceByKey which is easier than aggregateByKey.

.mapValues(value => (value, 1)) // map entry with a count of 1
.reduceByKey {
  case ((sumL, countL), (sumR, countR)) => 
    (sumL + sumR, countL + countR)
}
.mapValues { 
  case (sum , count) => sum / count 
}
.collect

https://www.safaribooksonline.com/library/view/learning -spark/9781449359034/ch04.html

这篇关于Spark:使用Scala在reduceByKey中使用平均值而不是sumByKey的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark:使用Scala在reduceByKey中使用平均值而不是sumByKey [英] Spark : Average of values instead of sum in reduceByKey using Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark:使用Scala在reduceByKey中使用平均值而不是sumByKey [英] Spark : Average of values instead of sum in reduceByKey using Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭