在Apache中使用星火reduceByKey（斯卡拉） [英] Using reduceByKey in Apache Spark (Scala)

查看：127 发布时间：2016/5/22 15:50:38 scala apache-spark

本文介绍了在Apache中使用星火reduceByKey（斯卡拉）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有类型的元组的列表：（用户ID，姓名，计数）：

I have a list of Tuples of type : (user id , name , count) :

val x = sc.parallelize(List( ("a" , "b" , 1) , ("a" , "b" , 1) , ("c" , "b" , 1) , ("a" , "d" , 1)))

我尝试这个集合减少到类型，其中每个
元素名称计算。

I'm attempting to reduce this collection to a type where each element name is counted.

所以在上面变种x被转换为：

So in above var x is converted to :

(a,ArrayBuffer((d,1), (b,2)))
(c,ArrayBuffer((b,1)))

下面是我目前使用code：

Here is the code I am currently using :

val byKey = x.map({case (id,uri,count) => (id,uri)->count})

    val grouped = byKey.groupByKey
    val count = grouped.map{case ((id,uri),count) => ((id),(uri,count.sum))}
    val grouped2 :  org.apache.spark.rdd.RDD[(String, Seq[(String, Int)])] = count.groupByKey

    grouped2.foreach(println)

我试图用reduceByKey，因为它的性能比groupByKey快。

I'm attempting to use reduceByKey as it performs faster than groupByKey.

如何reduceByKey来实现，而不是上面code提供
相同的映射？

How can reduceByKey be implemented instead of above code to provide the same mapping ?

推荐答案

按照你code：

val byKey = x.map({case (id,uri,count) => (id,uri)->count})

您可以这样做：

val reducedByKey = byKey.reduceByKey(_ + _)

scala> reducedByKey.collect.foreach(println)
((a,d),1)
((a,b),2)
((c,b),1)

PairRDDFunctions [K，V] .reduceByKey 取缔减少可施加到功能键入RDD第V [（K，V）。换句话说，你需要一个函数 F [V]（E1：V，E2：V）：V 。在对整型总和这种特殊情况下：（X：智力，Y：强度）=＆GT; X + Y 或 _ + _ 总之下划线符号。

PairRDDFunctions[K,V].reduceByKey takes an associative reduce function that can be applied to the to type V of the RDD[(K,V)]. In other words, you need a function f[V](e1:V, e2:V) : V . In this particular case with sum on Ints: (x:Int, y:Int) => x+y or _ + _ in short underscore notation.

有关的记录： reduceByKey ，因为它attemps应用洗牌前，当地减少功能的性能比 groupByKey 更好/ reduce阶段。 groupByKey 将迫使所有元素的洗牌分组之前。

For the record: reduceByKey performs better than groupByKey because it attemps to apply the reduce function locally before the shuffle/reduce phase. groupByKey will force a shuffle of all elements before grouping.

这篇关于在Apache中使用星火reduceByKey（斯卡拉）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Apache中使用星火reduceByKey（斯卡拉） [英] Using reduceByKey in Apache Spark (Scala)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Apache中使用星火reduceByKey（斯卡拉） [英] Using reduceByKey in Apache Spark (Scala)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭