如何将Iterable转换为RDD [英] How to convert an Iterable to an RDD

查看:97
本文介绍了如何将Iterable转换为RDD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

更具体地说,如何将 scala.Iterable 转换为 org.apache.spark.rdd.RDD ?

To be more specific, how can i convert a scala.Iterable to a org.apache.spark.rdd.RDD ?

我的RDD为(String,Iterable [(String,Integer)])并且我希望将其转换为(字符串,RDD [字符串,整数]) RDD ,以便可以将reduceByKey函数应用于内部RDD .

I have an RDD of (String, Iterable[(String, Integer)]) and i want this to be converted into an RDD of (String, RDD[String, Integer]), so that i can apply a reduceByKey function to the internal RDD.

例如我有一个RDD,其中键是一个人名的2个字母前缀,值是他们在活动中花费的人名对和小时数的列表

e.g i have an RDD where key is 2-lettered prefix of a person's name and the value is List of pairs of Person name and hours that they spent in an event

我的RDD是:

<代码>("To",List(("Tom",50),("Tod","30"),("Tom",70),("Tod","25"),("Tod",15))("Ja",List(("Jack",50),("James","30"),("Jane",70),("James","25"),("Jasper",15))

我需要将List转换为RDD,以便我可以使用累加每个人的总工作时间.应用reduceByKey并将结果设为("To",RDD(("Tom",120),("Tod","70"))("Ja",RDD(("Jack",120),("James","55"),("Jane",15))

i need the List to be converted to RDD so that i can use accumulate each person's total hours spent. Applying reduceByKey and make the result as ("To", RDD(("Tom",120),("Tod","70")) ("Ja", RDD(("Jack",120),("James","55"),("Jane",15))

但是我找不到任何这样的转换函数.我怎样才能做到这一点 ?

But i counldn't find any such transformation function. How can i do this ?

谢谢.

推荐答案

您可以使用 flatMap reduceByKey 来实现.像这样:

You can achieve this by using a flatMap and reduceByKey. Something like this:

rdd.flatMap{case(key, list) => list.map(item => ((key,item._1), item._2))}
   .reduceByKey(_+_)
   .map{case((key,name),hours) => (key, List((name, hours)))}
   .reduceByKey(_++_)

这篇关于如何将Iterable转换为RDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆