阿帕奇星火 - 生成列表对 [英] Apache Spark - Generate List Of Pairs
本文介绍了阿帕奇星火 - 生成列表对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
(V1,V2,...,VN)
Given a large file containing data of the form, (V1,V2,...,VN)
2,5
2,8,9
2,5,8
...
我想实现对类似下面的使用星火
I am trying to achieve a list of pairs similar to the following using Spark
((2,5),2)
((2,8),2)
((2,9),1)
((8,9),1)
((5,8),1)
我试图响应于年长问题提到的建议,但我也遇到了一些问题。例如,
I tried the suggestions mentioned in response to an older question, but I have encountered some issues. For example,
val dataRead = sc.textFile(inputFile)
val itemCounts = dataRead
.flatMap(line => line.split(","))
.map(item => (item, 1))
.reduceByKey((a, b) => a + b)
.cache()
val nums = itemCounts.keys
.filter({case (a) => a.length > 0})
.map(x => x.trim.toInt)
val pairs = nums.flatMap(x => nums2.map(y => (x,y)))
我得到了错误,
scala> val pairs = nums.flatMap(x => nums.map(y => (x,y)))
<console>:27: error: type mismatch;
found : org.apache.spark.rdd.RDD[(Int, Int)]
required: TraversableOnce[?]
val pairs = nums.flatMap(x => nums.map(y => (x,y)))
^
可能有人请点我朝着我可能会做不正确,或者可能是什么更好的方法来达到同样的?提前很多感谢。
Could someone please point me towards what I might be doing incorrectly, or what might be a better way to achieve the same? Many thanks in advance.
推荐答案
您可以使用阵列的组合方法来实现这一目标。
You may use combinations method of array to achieve this objective.
val dataRead = sc.textFile(inputFile)
// "2,5"
// "2,8,9"
// "2,5,8"
// ...
val c.ombinations = dataRead.flatMap { line =>
line.split(",") // "2,8,9" => Array(2,8,9)
.combinations(2) // Iterator
.toSeq // ~ Array(Array(2,8), Array(2,9), Array(8,9))
.map{ case arr => arr(0) -> arr(1) } // Array((2,8), (2,9), (8,9))
}
// Array((2,5), (2,8), (2,9), (8,9), (2,5), (2,8), (5, 8), ...)
val result = combinations.map(item => item -> 1) // Array(((2,5),1), ((2,9),1), ...)
.reduceByKey(_ + _)
// Array(((2,5),2), ((2,8),2), ((2,9),1), ((8,9),1), ((5,8),1) ....)
// order may be different.
这篇关于阿帕奇星火 - 生成列表对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文