在InputSplit for Hbase期间,Spark给Null指针异常 [英] Spark give Null pointer exception during InputSplit for Hbase

查看:269
本文介绍了在InputSplit for Hbase期间,Spark给Null指针异常的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的是Spark 1.2.1,Hbase 0.98.10和Hadoop 2.6.0。在检索hbase数据时,我得到一个零点异常。
在下面查找堆栈跟踪。

I am using Spark 1.2.1,Hbase 0.98.10 and Hadoop 2.6.0. I got a null point exception while retrieve data form hbase. Find stack trace below.


[sparkDriver-akka.actor.default-dispatcher-2] DEBUG NewHadoopRDD -
无法使用InputSplit#getLocationInfo。
java.lang.NullPointerException:null在
scala.collection.mutable.ArrayOps $ ofRef $ .length $ extension(ArrayOps.scala:114)
〜[scala-library-2.10.4 .jar:na]在
scala.collection.mutable.ArrayOps $ ofRef.length(ArrayOps.scala:114)
〜[scala-library-2.10.4.jar:na] at
scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:32)
〜[scala-library-2.10.4.jar:na] at
scala.collection.mutable.ArrayOps $ ofRef。
〜[scala-library-2.10.4.jar:na]
org.apache.spark.rdd.HadoopRDD $ .convertSplitLocationInfo(HadoopRDD.scala:401)$ foreach(ArrayOps.scala:108)
〜[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.rdd.NewHadoopRDD.getPreferredLocations(NewHadoopRDD.scala:215)
〜 [spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.rdd.RDD $$ anonfun $ preferredLocations $ 2.apply(RDD.scala:234)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.rdd.RDD $$ anonfun $ preferredLocat离子$ 2.apply(RDD.scala:234)
[spark-core_2.10-1.2.1.jar:1.2.1] at
scala.Option.getOrElse(Option.scala:120)[ scala-library-2.10.4.jar:na]
在org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:233)
[spark-core_2.10-1.2.1。 jar:1.2.1] at
org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal(DAGScheduler.scala:1326)
[spark-core_2.10在
中为-1.2.1.jar:1.2.1] org.apache.spark.scheduler.DAGScheduler $$ anonfun $ org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal $ 2 $$ anonfun $ apply $ 2.apply $ mcVI $ sp(DAGScheduler.scala:1336)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.scheduler.DAGScheduler $$ anonfun $ org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal $ 2 $$ anonfun $ apply $ 2.apply(DAGScheduler.scala:1335)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.scheduler.DAGScheduler $$ anonfun $ org $ apache $ spark $ scheduler $ DAGSchedul $ $$ getPreferredLocsInternal $ 2 $$ anonfun $ apply $ 2.apply(DAGScheduler.scala:1335)
[spark-core_2.10-1.2.1.jar:1.2.1] at
scala.collection。 immutable.List.foreach(List.scala:318)
[scala-library-2.10.4.jar:na] at
org.apache.spark.scheduler.DAGScheduler $$ anonfun $ org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal $ 2.apply(DAGScheduler.scala:1335)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark .scheduler.DAGScheduler $$ anonfun $ org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal $ 2.apply(DAGScheduler.scala:1333)
[spark-core_2.10-1.2.1.jar:1.2.1 ] at
scala.collection.immutable.List.foreach(List.scala:318)
[scala-library-2.10.4.jar:na] at
org.apache.spark。 scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal(DAGScheduler.scala:1333)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org .apache.spark.scheduler.DAGScheduler $$ anonfun $ $组织阿帕奇$ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal $ 2 $$ anonfun $ apply $ 2.apply $ mcVI $ sp(DAGScheduler.scala:1336)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.scheduler.DAGScheduler $$ anonfun $ org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal $ 2 $$ anonfun $ apply $ 2.apply(DAGScheduler.scala:1335)
[ spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.scheduler.DAGScheduler $$ anonfun $ org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal $ 2 $$ anonfun $ apply $ 2.apply(DAGScheduler.scala:1335)
[spark-core_2.10-1.2.1.jar:1.2.1] at
scala.collection.immutable.List.foreach(List。
[scala-library-2.10.4.jar:na] at
org.apache.spark.scheduler.DAGScheduler $$ anonfun $ org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal $ 2.apply(DAGScheduler.scala:1335)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.scheduler.DAGScheduler $$ anonfun $组织$阿帕奇$火花$ $调度DAGSch eduler $$ getPreferredLocsInternal $ 2.apply(DAGScheduler.scala:1333)
[spark-core_2.10-1.2.1.jar:1.2.1] at
scala.collection.immutable.List.foreach(
[scala-library-2.10.4.jar:na] at
org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ getPreferredLocsInternal (DAGScheduler.scala:1333)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.scheduler.DAGScheduler.getPreferredLocs(DAGScheduler.scala:1304 )
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.scheduler.DAGScheduler $$ anonfun $ 17.apply(DAGScheduler.scala:862)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.scheduler.DAGScheduler $$ anonfun $ 17.apply(DAGScheduler.scala:859)
[spark-core_2.10-1.2.1.jar:1.2.1] at
scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:244)
[scala- library-2.10.4.jar:na] at
scala.collection.TraversableLike $$ anonfun $ m
scala.collection.Iterator $ class.foreach(Iterator.scala:727)
[scala-library-2.10.4.jar:na] ap $ 1.apply(TraversableLike.scala:244) b $ b [scala-library-2.10.4.jar:na] at
scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
[scala-library-2.10.4.jar: na] at
scala.collection.IterableLike $ class.foreach(IterableLike.scala:72)
[scala-library-2.10.4.jar:na] at
scala.collection.AbstractIterable .foreach(Iterable.scala:54)
[scala-library-2.10.4.jar:na] at
scala.collection.TraversableLike $ class.map(TraversableLike.scala:244)
[scala-library-2.10.4.jar:na] at
scala.collection.AbstractTraversable.map(Traversable.scala:105)
[scala-library-2.10.4.jar:na]
org.apache.spark.scheduler.DAGScheduler.org $ apache $ spark $ scheduler $ DAGScheduler $$ submitMissingTasks(DAGScheduler.scala:859)
[spark-core_2.10-1.2.1.jar :1.2.1]在
org.apache.spark.scheduler.DAGScheduler.o rg $ apache $ spark $ scheduler $ DAGScheduler $$ submitStage(DAGScheduler.scala:778)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark .scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762)
[spark-core_2.10-1.2.1.jar:1.2.1] at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor $$ anonfun $ receive $ 2.applyOrElse(DAGScheduler.scala:1389)
[spark-core_2.10-1.2.1.jar:1.2.1] at
akka.actor.Actor $ class.aroundReceive(Actor .scala:465)
[akka-actor_2.10-2.3.4-spark.jar:na] at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375)
[spark-core_2.10-1.2.1.jar:1.2.1] at
akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
[akka-actor_2.10- 2.3.4-spark.jar:na] at
akka.actor.ActorCell.invoke(ActorCell.scala:487)
[akka-actor_2.10-2.3.4-spark.jar:na] at
akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
[akka-actor_2.10-2.3.4-s park.jar:na] at
akka.dispatch.Mailbox.run(Mailbox.scala:220)
[akka-actor_2.10-2.3.4-spark.jar:na] at
akka.dispatch.ForkJoinExecutorConfigurator $ AkkaForkJoinTask.exec(AbstractDispatcher.scala:393)
[akka-actor_2.10-2.3.4-spark.jar:na] at
scala.concurrent.forkjoin.ForkJoinTask .doExec(ForkJoinTask.java:260)
[scala-library-2.10.4.jar:na] at
scala.concurrent.forkjoin.ForkJoinPool $ WorkQueue.runTask(ForkJoinPool.java:1339)
[scala-library-2.10.4.jar:na] at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
[scala-library-2.10.4。 jar:na] at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
[scala-library-2.10.4.jar:na]

[sparkDriver-akka.actor.default-dispatcher-2] DEBUG NewHadoopRDD - Failed to use InputSplit#getLocationInfo. java.lang.NullPointerException: null at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114) ~[scala-library-2.10.4.jar:na] at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114) ~[scala-library-2.10.4.jar:na] at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32) ~[scala-library-2.10.4.jar:na] at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) ~[scala-library-2.10.4.jar:na] at org.apache.spark.rdd.HadoopRDD$.convertSplitLocationInfo(HadoopRDD.scala:401) ~[spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.rdd.NewHadoopRDD.getPreferredLocations(NewHadoopRDD.scala:215) ~[spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:234) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.rdd.RDD$$anonfun$preferredLocations$2.apply(RDD.scala:234) [spark-core_2.10-1.2.1.jar:1.2.1] at scala.Option.getOrElse(Option.scala:120) [scala-library-2.10.4.jar:na] at org.apache.spark.rdd.RDD.preferredLocations(RDD.scala:233) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1326) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1336) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1335) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1335) [spark-core_2.10-1.2.1.jar:1.2.1] at scala.collection.immutable.List.foreach(List.scala:318) [scala-library-2.10.4.jar:na] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1335) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1333) [spark-core_2.10-1.2.1.jar:1.2.1] at scala.collection.immutable.List.foreach(List.scala:318) [scala-library-2.10.4.jar:na] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1333) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply$mcVI$sp(DAGScheduler.scala:1336) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1335) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2$$anonfun$apply$2.apply(DAGScheduler.scala:1335) [spark-core_2.10-1.2.1.jar:1.2.1] at scala.collection.immutable.List.foreach(List.scala:318) [scala-library-2.10.4.jar:na] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1335) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal$2.apply(DAGScheduler.scala:1333) [spark-core_2.10-1.2.1.jar:1.2.1] at scala.collection.immutable.List.foreach(List.scala:318) [scala-library-2.10.4.jar:na] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$getPreferredLocsInternal(DAGScheduler.scala:1333) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler.getPreferredLocs(DAGScheduler.scala:1304) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$17.apply(DAGScheduler.scala:862) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler$$anonfun$17.apply(DAGScheduler.scala:859) [spark-core_2.10-1.2.1.jar:1.2.1] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) [scala-library-2.10.4.jar:na] at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) [scala-library-2.10.4.jar:na] at scala.collection.Iterator$class.foreach(Iterator.scala:727) [scala-library-2.10.4.jar:na] at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) [scala-library-2.10.4.jar:na] at scala.collection.IterableLike$class.foreach(IterableLike.scala:72) [scala-library-2.10.4.jar:na] at scala.collection.AbstractIterable.foreach(Iterable.scala:54) [scala-library-2.10.4.jar:na] at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) [scala-library-2.10.4.jar:na] at scala.collection.AbstractTraversable.map(Traversable.scala:105) [scala-library-2.10.4.jar:na] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:859) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:778) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGScheduler.handleJobSubmitted(DAGScheduler.scala:762) [spark-core_2.10-1.2.1.jar:1.2.1] at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1389) [spark-core_2.10-1.2.1.jar:1.2.1] at akka.actor.Actor$class.aroundReceive(Actor.scala:465) [akka-actor_2.10-2.3.4-spark.jar:na] at org.apache.spark.scheduler.DAGSchedulerEventProcessActor.aroundReceive(DAGScheduler.scala:1375) [spark-core_2.10-1.2.1.jar:1.2.1] at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) [akka-actor_2.10-2.3.4-spark.jar:na] at akka.actor.ActorCell.invoke(ActorCell.scala:487) [akka-actor_2.10-2.3.4-spark.jar:na] at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) [akka-actor_2.10-2.3.4-spark.jar:na] at akka.dispatch.Mailbox.run(Mailbox.scala:220) [akka-actor_2.10-2.3.4-spark.jar:na] at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:393) [akka-actor_2.10-2.3.4-spark.jar:na] at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) [scala-library-2.10.4.jar:na] at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) [scala-library-2.10.4.jar:na]

请为此问题提供解决方案。

Please provide me solution for this problem.

推荐答案

在getPreferredLocations阶段引发异常,所以没有关于您的hbase的更多信息ConfigurationI建议您查看hbase.table。名字和hbase.master(这最后一个我不知道如果正确定义HMaster)配置为你想要的

The Exception is thrown in the getPreferredLocations phase so without more information about your hbase ConfigurationI suggest you look at the hbase.table.name and the hbase.master (this last one i do not if the correct to define the HMaster) are configured as you want

这篇关于在InputSplit for Hbase期间,Spark给Null指针异常的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆