NPE与乔达日期时间火花 [英] NPE in spark with Joda DateTime

查看:151
本文介绍了NPE与乔达日期时间火花的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在上乔达DateTime字段我receving执行的NullPointerException火花简单的映射。

code片断:

  VAL ME1 =(帐户,DateTime.now())
VAL ME2 =(帐户,DateTime.now())
VAL ME3 =(帐户,DateTime.now())
VAL RDD = spark.parallelize(名单(ME1,ME2,ME3))VAL结果= {rdd.map案(A,D)=> (一,d.dayOfMonth()。roundFloorCopy())}。collect.toList

堆栈跟踪:

 显示java.lang.NullPointerException
    在org.joda.time.DateTime $ Property.roundFloorCopy(DateTime.java:2280)
    在x.y.z.jobs.info.AggJobTest $$ anonfun $ 1 $$ anonfun $ 2.适用(AggJobTest.scala:47)
    在x.y.z.jobs.info.AggJobTest $$ anonfun $ 1 $$ anonfun $ 2.适用(AggJobTest.scala:47)
    在scala.collection.Iterator $$不久$ 11.next(Iterator.scala:328)
    在scala.collection.Iterator $ class.foreach(Iterator.scala:727)
    在scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    在scala.collection.generic.Growable $ $类加$另加$ EQ(Growable.scala:48)。
    在scala.collection.mutable.ArrayBuffer $另加$另加$ EQ(ArrayBuffer.scala:103)。
    在scala.collection.mutable.ArrayBuffer $另加$另加$ EQ(ArrayBuffer.scala:47)。
    在scala.collection.TraversableOnce $ class.to(TraversableOnce.scala:273)
    在scala.collection.AbstractIterator.to(Iterator.scala:1157)
    在scala.collection.TraversableOnce $ class.toBuffer(TraversableOnce.scala:265)
    在scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    在scala.collection.TraversableOnce $ class.toArray(TraversableOnce.scala:252)
    在scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    在org.apache.spark.rdd.RDD $$ anonfun $ 16.apply(RDD.scala:780)
    在org.apache.spark.rdd.RDD $$ anonfun $ 16.apply(RDD.scala:780)
    在org.apache.spark.SparkContext $$ anonfun $ runJob $ 4.适用(SparkContext.scala:1314)
    在org.apache.spark.SparkContext $$ anonfun $ runJob $ 4.适用(SparkContext.scala:1314)
    在org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    在org.apache.spark.scheduler.Task.run(Task.scala:56)
    在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:196)
    在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    在java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)
    在java.lang.Thread.run(Thread.java:745)

任何建议如何解决这个问题?

更新:
在ordeer重现您需要使用KryoSerializer问题:


  

.SET(spark.serializer,
  org.apache.spark.serializer.KryoSerializer)



解决方案

正如你所指出的那样,您使用的是KryoSerializer与乔达DateTime对象。看来,系列化留下了一些必要的信息,您不妨看看使用增加了对乔达的DateTime对象KRYO支持项目之一。例如 https://github.com/magro/kryo-serializers 提供了一个叫做串行 JodaDateTimeSerializer ,您可以用 kryo.register(DateTime.class,新JodaDateTimeSerializer())注册;

When executing simple mapping in spark on joda DateTime field I am receving NullPointerException.

Code snippet:

val me1 = (accountId, DateTime.now())
val me2 = (accountId, DateTime.now())
val me3 = (accountId, DateTime.now())
val rdd = spark.parallelize(List(me1, me2, me3))

val result = rdd.map{case (a,d) => (a,d.dayOfMonth().roundFloorCopy())}.collect.toList

Stacktrace:

java.lang.NullPointerException
    at org.joda.time.DateTime$Property.roundFloorCopy(DateTime.java:2280)
    at x.y.z.jobs.info.AggJobTest$$anonfun$1$$anonfun$2.apply(AggJobTest.scala:47)
    at x.y.z.jobs.info.AggJobTest$$anonfun$1$$anonfun$2.apply(AggJobTest.scala:47)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)
    at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)
    at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)
    at scala.collection.AbstractIterator.to(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)
    at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)
    at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)
    at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)
    at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:780)
    at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:780)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
    at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1314)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
    at org.apache.spark.scheduler.Task.run(Task.scala:56)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

Any suggestions how to solve this problem?

Update: In ordeer to reproduce the problem you need to use KryoSerializer:

.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

解决方案

As you have pointed out, you are using the KryoSerializer with the Joda DateTime object. It appears that the serialization has left out some required information, you may wish to look at using one of the projects which adds support for Joda DateTime objects to Kryo. For example https://github.com/magro/kryo-serializers provides a serializer called JodaDateTimeSerializer which you could register with kryo.register( DateTime.class, new JodaDateTimeSerializer() );

这篇关于NPE与乔达日期时间火花的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆