如何将Scala RDD转换为地图 [英] How to convert Scala RDD to Map
问题描述
我有一个RDD(字符串数组)org.apache.spark.rdd.RDD[String] = MappedRDD[18]
并将其转换为具有唯一ID的地图.我做了'val vertexMAp = vertices.zipWithUniqueId
'
但这给了我另一个'org.apache.spark.rdd.RDD[(String, Long)]'
类型的RDD,但是我想要一个'Map[String, Long]
'.如何转换我的'org.apache.spark.rdd.RDD[(String, Long)] to Map[String, Long]
'?
I have a RDD (array of String) org.apache.spark.rdd.RDD[String] = MappedRDD[18]
and to convert it to a map with unique Ids. I did 'val vertexMAp = vertices.zipWithUniqueId
'
but this gave me another RDD of type 'org.apache.spark.rdd.RDD[(String, Long)]'
but I want a 'Map[String, Long]
' . How can I convert my 'org.apache.spark.rdd.RDD[(String, Long)] to Map[String, Long]
' ?
谢谢
推荐答案
PairRDDFunctions
中有一个内置的collectAsMap
函数,可以为您提供RDD中对值的映射.
There's a built-in collectAsMap
function in PairRDDFunctions
that would deliver you a map of the pair values in the RDD.
val vertexMAp = vertices.zipWithUniqueId.collectAsMap
请记住,RDD是分布式数据结构,这一点很重要.您可以将其可视化为散布在整个群集中的数据的一部分". collect
时,您必须将所有这些片段都交给驱动程序并能够做到这一点,它们需要装入驱动程序的内存中.
It's important to remember that an RDD is a distributed data structure. You can visualize it a 'pieces' of your data spread over the cluster. When you collect
, you force all those pieces to go to the driver and to be able to do that, they need to fit in the memory of the driver.
从注释中看,您的情况似乎需要处理大型数据集.用它制作地图无法正常工作,因为它不适合驱动程序的内存.如果尝试,则会导致OOM异常.
From the comments, it looks like in your case, you need to deal with a large dataset. Making a Map out of it is not going to work as it won't fit on the driver's memory; causing OOM exceptions if you try.
您可能需要将数据集保留为RDD.如果要创建地图以查找元素,则可以在PairRDD上使用lookup
,如下所示:
You probably need to keep the dataset as an RDD. If you are creating a Map in order to lookup elements, you could use lookup
on a PairRDD instead, like this:
import org.apache.spark.SparkContext._ // import implicits conversions to support PairRDDFunctions
val vertexMap = vertices.zipWithUniqueId
val vertixYId = vertexMap.lookup("vertexY")
这篇关于如何将Scala RDD转换为地图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!