Spark + Json4s序列化问题 [英] Spark + Json4s serialization problems

查看：287 发布时间：2019/11/24 20:54:35 json apache-spark serialization json4s

本文介绍了Spark + Json4s序列化问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Spark 2.2.0闭包内部使用Json4s类.无法序列化DefaultFormats的解决方法"是包括其定义

I am using Json4s classes inside of a Spark 2.2.0 closure. The "workaround" for a failure to serialize DefaultFormats is to include their definition inside every closure executed by Spark that needs them. I believe I have done more than I needed to below but still get the serialization failure.

使用Spark 2.2.0，Scala 2.11，Json4s 3.2.x(无论Spark中有什么)，还尝试通过使用sbt将其引入我的工作中来使用Json4s 3.5.3.在所有情况下，我都使用下面显示的解决方法.

Using Spark 2.2.0, Scala 2.11, Json4s 3.2.x (whatever is in Spark) and also tried using Json4s 3.5.3 by pulling it into my job using sbt. In all cases I used the workaround shown below.

有人知道我在做什么错吗?

Does anyone know what I'm doing wrong?

logger.info(s"Creating an RDD for $actionName")
implicit val formats = DefaultFormats
val itemProps = df.rdd.map[(ItemID, ItemProps)](row => { <--- error points to this line
  implicit val formats = DefaultFormats
  val itemId = row.getString(0)
  val correlators = row.getSeq[String](1).toList
  (itemId, Map(actionName -> JArray(correlators.map { t =>
    implicit val formats = DefaultFormats
    JsonAST.JString(t)
  })))
})

我还尝试了另一种建议，那就是在类构造函数区域而不是在闭包区域设置DefaultFormats隐式设置，在任何地方都没有运气.

I have also tried another suggestion, which is to set the DefaultFormats implicit in the class constructor area and not in the closure, no luck anywhere.

JVM错误跟踪来自Spark，抱怨该任务无法序列化并指向上面的行(无论如何，我的代码中的最后一行)，然后通过以下方式解释了根本原因:

The JVM error trace is from Spark complaining that the task is not serializable and pointing to the line above (last line in my code anyway) then the root cause is explained with:

Serialization stack:
- object not serializable (class: org.json4s.DefaultFormats$, value: org.json4s.DefaultFormats$@7fdd29f3)
- field (class: com.actionml.URAlgorithm, name: formats, type: class org.json4s.DefaultFormats$)
- object (class com.actionml.URAlgorithm, com.actionml.URAlgorithm@2dbfa972)
- field (class: com.actionml.URAlgorithm$$anonfun$udfLLR$1, name: $outer, type: class com.actionml.URAlgorithm)
- object (class com.actionml.URAlgorithm$$anonfun$udfLLR$1, <function3>)
- field (class: org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$4, name: func$4, type: interface scala.Function3)
- object (class org.apache.spark.sql.catalyst.expressions.ScalaUDF$$anonfun$4, <function1>)
- field (class: org.apache.spark.sql.catalyst.expressions.ScalaUDF, name: f, type: interface scala.Function1)
- object (class org.apache.spark.sql.catalyst.expressions.ScalaUDF, UDF(input[2, bigint, false], input[3, bigint, false], input[5, bigint, false]))
- element of array (index: 1)
- array (class [Ljava.lang.Object;, size 3)
- field (class: org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10, name: references$1, type: class [Ljava.lang.Object;)
- object (class org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10, <function2>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
... 128 more

Spark + Json4s序列化问题 [英] Spark + Json4s serialization problems

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark + Json4s序列化问题 [英] Spark + Json4s serialization problems

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭