在Apache中使用星火reduceByKey(斯卡拉) [英] Using reduceByKey in Apache Spark (Scala)
问题描述
我有类型的元组的列表:(用户ID,姓名,计数):
I have a list of Tuples of type : (user id , name , count) :
val x = sc.parallelize(List( ("a" , "b" , 1) , ("a" , "b" , 1) , ("c" , "b" , 1) , ("a" , "d" , 1)))
我尝试这个集合减少到类型,其中每个
元素名称计算。
I'm attempting to reduce this collection to a type where each element name is counted.
所以在上面变种x被转换为:
So in above var x is converted to :
(a,ArrayBuffer((d,1), (b,2)))
(c,ArrayBuffer((b,1)))
下面是我目前使用code:
Here is the code I am currently using :
val byKey = x.map({case (id,uri,count) => (id,uri)->count})
val grouped = byKey.groupByKey
val count = grouped.map{case ((id,uri),count) => ((id),(uri,count.sum))}
val grouped2 : org.apache.spark.rdd.RDD[(String, Seq[(String, Int)])] = count.groupByKey
grouped2.foreach(println)
我试图用reduceByKey,因为它的性能比groupByKey快。
I'm attempting to use reduceByKey as it performs faster than groupByKey.
如何reduceByKey来实现,而不是上面code提供
相同的映射?
How can reduceByKey be implemented instead of above code to provide the same mapping ?
推荐答案
按照你code:
val byKey = x.map({case (id,uri,count) => (id,uri)->count})
您可以这样做:
val reducedByKey = byKey.reduceByKey(_ + _)
scala> reducedByKey.collect.foreach(println)
((a,d),1)
((a,b),2)
((c,b),1)
PairRDDFunctions [K,V] .reduceByKey
取缔减少可施加到功能键入RDD第V [(K,V)。换句话说,你需要一个函数 F [V](E1:V,E2:V):V
。在对整型总和这种特殊情况下:(X:智力,Y:强度)=> X + Y
或 _ + _
总之下划线符号。
PairRDDFunctions[K,V].reduceByKey
takes an associative reduce function that can be applied to the to type V of the RDD[(K,V)]. In other words, you need a function f[V](e1:V, e2:V) : V
. In this particular case with sum on Ints: (x:Int, y:Int) => x+y
or _ + _
in short underscore notation.
有关的记录: reduceByKey
,因为它attemps应用洗牌前,当地减少功能的性能比 groupByKey
更好/ reduce阶段。 groupByKey
将迫使所有元素的洗牌分组之前。
For the record: reduceByKey
performs better than groupByKey
because it attemps to apply the reduce function locally before the shuffle/reduce phase. groupByKey
will force a shuffle of all elements before grouping.
这篇关于在Apache中使用星火reduceByKey(斯卡拉)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!