NPE与乔达日期时间火花 [英] NPE in spark with Joda DateTime
问题描述
在上乔达DateTime字段我receving执行的NullPointerException火花简单的映射。
code片断:
VAL ME1 =(帐户,DateTime.now())
VAL ME2 =(帐户,DateTime.now())
VAL ME3 =(帐户,DateTime.now())
VAL RDD = spark.parallelize(名单(ME1,ME2,ME3))VAL结果= {rdd.map案(A,D)=> (一,d.dayOfMonth()。roundFloorCopy())}。collect.toList
堆栈跟踪:
显示java.lang.NullPointerException
在org.joda.time.DateTime $ Property.roundFloorCopy(DateTime.java:2280)
在x.y.z.jobs.info.AggJobTest $$ anonfun $ 1 $$ anonfun $ 2.适用(AggJobTest.scala:47)
在x.y.z.jobs.info.AggJobTest $$ anonfun $ 1 $$ anonfun $ 2.适用(AggJobTest.scala:47)
在scala.collection.Iterator $$不久$ 11.next(Iterator.scala:328)
在scala.collection.Iterator $ class.foreach(Iterator.scala:727)
在scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
在scala.collection.generic.Growable $ $类加$另加$ EQ(Growable.scala:48)。
在scala.collection.mutable.ArrayBuffer $另加$另加$ EQ(ArrayBuffer.scala:103)。
在scala.collection.mutable.ArrayBuffer $另加$另加$ EQ(ArrayBuffer.scala:47)。
在scala.collection.TraversableOnce $ class.to(TraversableOnce.scala:273)
在scala.collection.AbstractIterator.to(Iterator.scala:1157)
在scala.collection.TraversableOnce $ class.toBuffer(TraversableOnce.scala:265)
在scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
在scala.collection.TraversableOnce $ class.toArray(TraversableOnce.scala:252)
在scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
在org.apache.spark.rdd.RDD $$ anonfun $ 16.apply(RDD.scala:780)
在org.apache.spark.rdd.RDD $$ anonfun $ 16.apply(RDD.scala:780)
在org.apache.spark.SparkContext $$ anonfun $ runJob $ 4.适用(SparkContext.scala:1314)
在org.apache.spark.SparkContext $$ anonfun $ runJob $ 4.适用(SparkContext.scala:1314)
在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
在org.apache.spark.scheduler.Task.run(Task.scala:56)
在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:196)
在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
在java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)
在java.lang.Thread.run(Thread.java:745)
任何建议如何解决这个问题?
更新:
在ordeer重现您需要使用KryoSerializer问题:
.SET(spark.serializer,
org.apache.spark.serializer.KryoSerializer)
块引用>解决方案正如你所指出的那样,您使用的是KryoSerializer与乔达DateTime对象。看来,系列化留下了一些必要的信息,您不妨看看使用增加了对乔达的DateTime对象KRYO支持项目之一。例如 https://github.com/magro/kryo-serializers 提供了一个叫做串行
JodaDateTimeSerializer
,您可以用kryo.register(DateTime.class,新JodaDateTimeSerializer())注册;
When executing simple mapping in spark on joda DateTime field I am receving NullPointerException.
Code snippet:
val me1 = (accountId, DateTime.now()) val me2 = (accountId, DateTime.now()) val me3 = (accountId, DateTime.now()) val rdd = spark.parallelize(List(me1, me2, me3)) val result = rdd.map{case (a,d) => (a,d.dayOfMonth().roundFloorCopy())}.collect.toList
Stacktrace:
java.lang.NullPointerException at org.joda.time.DateTime$Property.roundFloorCopy(DateTime.java:2280) at x.y.z.jobs.info.AggJobTest$$anonfun$1$$anonfun$2.apply(AggJobTest.scala:47) at x.y.z.jobs.info.AggJobTest$$anonfun$1$$anonfun$2.apply(AggJobTest.scala:47) at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) at scala.collection.Iterator$class.foreach(Iterator.scala:727) at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) at scala.collection.AbstractIterator.to(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:780) at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:780) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) at org.apache.spark.scheduler.Task.run(Task.scala:56) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)
Any suggestions how to solve this problem?
Update: In ordeer to reproduce the problem you need to use KryoSerializer:
.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")
解决方案As you have pointed out, you are using the KryoSerializer with the Joda DateTime object. It appears that the serialization has left out some required information, you may wish to look at using one of the projects which adds support for Joda DateTime objects to Kryo. For example https://github.com/magro/kryo-serializers provides a serializer called
JodaDateTimeSerializer
which you could register withkryo.register( DateTime.class, new JodaDateTimeSerializer() );
这篇关于NPE与乔达日期时间火花的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!