阿帕奇星火 - 生成列表对 [英] Apache Spark - Generate List Of Pairs

查看:190
本文介绍了阿帕奇星火 - 生成列表对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

(V1,V2,...,VN)

Given a large file containing data of the form, (V1,V2,...,VN)

2,5
2,8,9
2,5,8
...

我想实现对类似下面的使用星火

I am trying to achieve a list of pairs similar to the following using Spark

((2,5),2)
((2,8),2)
((2,9),1)
((8,9),1)
((5,8),1)

我试图响应于年长问题提到的建议,但我也遇到了一些问题。例如,

I tried the suggestions mentioned in response to an older question, but I have encountered some issues. For example,

val dataRead = sc.textFile(inputFile)
val itemCounts = dataRead
  .flatMap(line => line.split(","))
  .map(item => (item, 1))
  .reduceByKey((a, b) => a + b)
  .cache()
val nums = itemCounts.keys
  .filter({case (a) => a.length > 0})
  .map(x => x.trim.toInt)
val pairs = nums.flatMap(x => nums2.map(y => (x,y)))

我得到了错误,

scala> val pairs = nums.flatMap(x => nums.map(y => (x,y)))
<console>:27: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[(Int, Int)]
 required: TraversableOnce[?]
       val pairs = nums.flatMap(x => nums.map(y => (x,y)))
                                             ^

可能有人请点我朝着我可能会做不正确​​,或者可能是什么更好的方法来达到同样的?提前很多感谢。

Could someone please point me towards what I might be doing incorrectly, or what might be a better way to achieve the same? Many thanks in advance.

推荐答案

您可以使用阵列的组合方法来实现这一目标。

You may use combinations method of array to achieve this objective.

val dataRead = sc.textFile(inputFile)
// "2,5"
// "2,8,9"
// "2,5,8" 
//  ...

val c.ombinations = dataRead.flatMap { line =>
        line.split(",")        // "2,8,9" => Array(2,8,9)
            .combinations(2)   // Iterator
            .toSeq             // ~ Array(Array(2,8), Array(2,9), Array(8,9))
            .map{ case arr => arr(0) -> arr(1) }  // Array((2,8), (2,9), (8,9))
}

// Array((2,5), (2,8), (2,9), (8,9), (2,5), (2,8), (5, 8), ...)

val result = combinations.map(item => item -> 1) // Array(((2,5),1), ((2,9),1), ...)
                         .reduceByKey(_ + _)   
// Array(((2,5),2), ((2,8),2), ((2,9),1), ((8,9),1), ((5,8),1) ....) 
// order may be different.

这篇关于阿帕奇星火 - 生成列表对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆