如何应对火花错误SPARK-5063 [英] how to deal with error SPARK-5063 in spark
问题描述
我得到错误信息SPARK-5063中的println行
I get the error message SPARK-5063 in the line of println
val d.foreach{x=> for(i<-0 until x.length)
println(m.lookup(x(i)))}
d为 RDD [数组[字符串]]
m是 RDD [(字符串,字符串)]
。有什么方法来打印,因为我想要的方式?或者我如何转换d从 RDD [数组[字符串]]
到数组[字符串]
?
d is RDD[Array[String]]
m is RDD[(String, String)]
. Is there any way to print as the way I want? or how can i convert d from RDD[Array[String]]
to Array[String]
?
推荐答案
SPARK-5063 涉及试图巢RDD操作,这是不被支持时更好的错误消息。
SPARK-5063 relates to better error messages when trying to nest RDD operations, which is not supported.
这是一个可用性问题,而不是功能性的。根本原因是RDD业务的嵌套和解决的办法是打破了。
It's a usability issue, not a functional one. The root cause is the nesting of RDD operations and the solution is to break that up.
在这里,我们试图联接 dRDD
和 MRDD
的。如果 MRDD
尺寸较大,则 rdd.join
将被推荐的方式,否则,如果 MRDD
小,即每个执行人装入内存,我们可以收集它,播放它,做一个map端加入。
Here we are trying a join of dRDD
and mRDD
. If the size of mRDD
is large, a rdd.join
would be the recommended way otherwise, if mRDD
is small, i.e. fits in memory of each executor, we could collect it, broadcast it and do a 'map-side' join.
一个简单的加入会是这样的:
A simple join would go like this:
val rdd = sc.parallelize(Seq(Array("one","two","three"), Array("four", "five", "six")))
val map = sc.parallelize(Seq("one" -> 1, "two" -> 2, "three" -> 3, "four" -> 4, "five" -> 5, "six"->6))
val flat = rdd.flatMap(_.toSeq).keyBy(x=>x)
val res = flat.join(map).map{case (k,v) => v}
如果我们想用广播,我们首先需要在本地收集解析表的价值,以B / C,为所有的执行者。 注意被广播的RDD的必须的配合驱动程序的内存以及每个执行者。
If we would like to use broadcast, we first need to collect the value of the resolution table locally in order to b/c that to all executors. NOTE the RDD to be broadcasted MUST fit in the memory of the driver as well as of each executor.
val rdd = sc.parallelize(Seq(Array("one","two","three"), Array("four", "five", "six")))
val map = sc.parallelize(Seq("one" -> 1, "two" -> 2, "three" -> 3, "four" -> 4, "five" -> 5, "six"->6)))
val bcTable = sc.broadcast(map.collectAsMap)
val res2 = rdd.flatMap{arr => arr.map(elem => (elem, bcTable.value(elem)))}
这篇关于如何应对火花错误SPARK-5063的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!