如何应对火花错误SPARK-5063 [英] how to deal with error SPARK-5063 in spark

查看:325
本文介绍了如何应对火花错误SPARK-5063的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我得到错误信息SPARK-5063中的println行

I get the error message SPARK-5063 in the line of println

val d.foreach{x=> for(i<-0 until x.length)
      println(m.lookup(x(i)))}    

d为 RDD [数组[字符串]] m是 RDD [(字符串,字符串)] 。有什么方法来打印,因为我想要的方式?或者我如何转换d从 RDD [数组[字符串]] 数组[字符串]

d is RDD[Array[String]] m is RDD[(String, String)] . Is there any way to print as the way I want? or how can i convert d from RDD[Array[String]] to Array[String] ?

推荐答案

SPARK-5063 涉及试图巢RDD操作,这是不被支持时更好的错误消息。

SPARK-5063 relates to better error messages when trying to nest RDD operations, which is not supported.

这是一个可用性问题,而不是功能性的。根本原因是RDD业务的嵌套和解决的办法是打破了。

It's a usability issue, not a functional one. The root cause is the nesting of RDD operations and the solution is to break that up.

在这里,我们试图联接 dRDD MRDD 的。如果 MRDD 尺寸较大,则 rdd.join 将被推荐的方式,否则,如果 MRDD 小,即每个执行人装入内存,我们可以收集它,播放它,做一个map端加入。

Here we are trying a join of dRDD and mRDD. If the size of mRDD is large, a rdd.join would be the recommended way otherwise, if mRDD is small, i.e. fits in memory of each executor, we could collect it, broadcast it and do a 'map-side' join.

一个简单的加入会是这样的:

A simple join would go like this:

val rdd = sc.parallelize(Seq(Array("one","two","three"), Array("four", "five", "six")))
val map = sc.parallelize(Seq("one" -> 1, "two" -> 2, "three" -> 3, "four" -> 4, "five" -> 5, "six"->6))
val flat = rdd.flatMap(_.toSeq).keyBy(x=>x)
val res = flat.join(map).map{case (k,v) => v}

如果我们想用广播,我们首先需要在本地收集解析表的价值,以B / C,为所有的执行者。 注意被广播的RDD的必须的配合驱动程序的内存以及每个执行者。

If we would like to use broadcast, we first need to collect the value of the resolution table locally in order to b/c that to all executors. NOTE the RDD to be broadcasted MUST fit in the memory of the driver as well as of each executor.

val rdd = sc.parallelize(Seq(Array("one","two","three"), Array("four", "five", "six")))
val map = sc.parallelize(Seq("one" -> 1, "two" -> 2, "three" -> 3, "four" -> 4, "five" -> 5, "six"->6)))
val bcTable = sc.broadcast(map.collectAsMap)
val res2 = rdd.flatMap{arr => arr.map(elem => (elem, bcTable.value(elem)))} 

这篇关于如何应对火花错误SPARK-5063的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆