Scala Spark 中的 NullPointerException,似乎是由集合类型引起的? [英] NullPointerException in Scala Spark, appears to be caused be collection type?

查看:28
本文介绍了Scala Spark 中的 NullPointerException,似乎是由集合类型引起的?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

sessionIdList 的类型为:

scala> sessionIdList
res19: org.apache.spark.rdd.RDD[String] = MappedRDD[17] at distinct at <console>:30

当我尝试运行以下代码时:

When I try to run below code :

val x = sc.parallelize(List(1,2,3)) 
val cartesianComp = x.cartesian(x).map(x => (x))

val kDistanceNeighbourhood = sessionIdList.map(s => {
    cartesianComp.filter(v => v != null)
})

kDistanceNeighbourhood.take(1)

我收到异常:

14/05/21 16:20:46 ERROR Executor: Exception in task ID 80
java.lang.NullPointerException
        at org.apache.spark.rdd.RDD.filter(RDD.scala:261)
        at $line94.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:38)
        at $line94.$read$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:36)
        at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
        at scala.collection.Iterator$class.foreach(Iterator.scala:727)

但是如果我使用:

val l = sc.parallelize(List("1","2")) 
val kDistanceNeighbourhood = l.map(s => {    
    cartesianComp.filter(v => v != null)
})

kDistanceNeighbourhood.take(1)

然后不显示异常

这两个代码片段之间的区别在于,第一个片段 sessionIdList 的类型为:

The difference between the two code snippets is that in first snippet sessionIdList is of type :

res19: org.apache.spark.rdd.RDD[String] = MappedRDD[17] at distinct at <console>:30

并且在第二个片段中l"是类型

and in second snippet "l" is of type

scala> l
res13: org.apache.spark.rdd.RDD[String] = ParallelCollectionRDD[32] at parallelize at <console>:12

为什么会出现这个错误?

Why is this error occuring ?

是否需要将 sessionIdList 转换为 ParallelCollectionRDD 才能解决此问题?

Do I need to convert sessionIdList to ParallelCollectionRDD in order to fix this ?

推荐答案

Spark 不支持 RDD 的嵌套(参见 https:///stackoverflow.com/a/14130534/590203 用于再次出现相同的问题),因此您无法在其他 RDD 操作中对 RDD 执行转换或操作.

Spark doesn't support nesting of RDDs (see https://stackoverflow.com/a/14130534/590203 for another occurrence of the same problem), so you can't perform transformations or actions on RDDs inside of other RDD operations.

在第一种情况下,当工作人员尝试访问仅存在于驱动程序而非工作人员上的 SparkContext 对象时,您会看到工作人员抛出 NullPointerException.

In the first case, you're seeing a NullPointerException thrown by the worker when it tries to access a SparkContext object that's only present on the driver and not the workers.

在第二种情况下,我的预感是该工作是在本地驱动程序上运行的,纯粹是偶然的.

In the second case, my hunch is the job was run locally on the driver and worked purely by accident.

这篇关于Scala Spark 中的 NullPointerException,似乎是由集合类型引起的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆