如何在Spark Dataset中存储嵌套的自定义对象? [英] How to store nested custom objects in Spark Dataset?
本文介绍了如何在Spark Dataset中存储嵌套的自定义对象?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
火花版本:3.0.1
Spark version: 3.0.1
可以实现非嵌套的自定义类型:
Non-nested custom type is achievable:
import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}
class AnObj(val a: Int, val b: String)
implicit val myEncoder: Encoder[AnObj] = Encoders.kryo[AnObj]
val d = spark.createDataset(Seq(new AnObj(1, "a")))
d.printSchema
root
|-- value: binary (nullable = true)
但是,如果自定义类型在 product
类型(即 case class
)内是嵌套,则会出现错误:
However, if the custom type is nested inside a product
type (i.e. case class
), it gives an error:
java.lang.UnsupportedOperationException:未找到InnerObj的编码器
java.lang.UnsupportedOperationException: No Encoder found for InnerObj
import spark.implicits._
import org.apache.spark.sql.{Encoder, Encoders}
class InnerObj(val a: Int, val b: String)
case class MyObj(val i: Int, val j: InnerObj)
implicit val myEncoder: Encoder[InnerObj] = Encoders.kryo[InnerObj]
// error
val d = spark.createDataset(Seq(new MyObj(1, new InnerObj(0, "a"))))
// it gives Runtime error: java.lang.UnsupportedOperationException: No Encoder found for InnerObj
我们如何创建具有嵌套自定义类型的 Dataset
?
How can we create Dataset
with nested custom type?
推荐答案
同时为MyObj和InnerObj添加编码器应该可以使其工作.
Adding the encoders for both MyObj and InnerObj should make it work.
class InnerObj(val a:Int, val b: String)
case class MyObj(val i: Int, j: InnerObj)
implicit val myEncoder: Encoder[InnerObj] = Encoders.kryo[InnerObj]
implicit val objEncoder: Encoder[MyObj] = Encoders.kryo[MyObj]
上面的代码片段可以编译并正常运行
The above snippet compile and run fine
这篇关于如何在Spark Dataset中存储嵌套的自定义对象?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文