为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时? [英] Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

查看：25 发布时间：2021/11/12 5:29:37 scala apache-spark apache-spark-dataset apache-spark-encoders

本文介绍了为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Spark 2.0(最终版)和 Scala 2.11.8.以下超级简单的代码产生编译错误 Error:(17, 45) Unable to find encoder for type stored in a Dataset.通过导入 spark.implicits 支持原始类型(Int、String 等)和产品类型(case 类)._ 后续版本中将添加对序列化其他类型的支持.

Spark 2.0 (final) with Scala 2.11.8. The following super simple code yields the compilation error Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

import org.apache.spark.sql.SparkSession

case class SimpleTuple(id: Int, desc: String)

object DatasetTest {
  val dataList = List(
    SimpleTuple(5, "abc"),
    SimpleTuple(6, "bcd")
  )

  def main(args: Array[String]): Unit = {
    val sparkSession = SparkSession.builder.
      master("local")
      .appName("example")
      .getOrCreate()

    val dataset = sparkSession.createDataset(dataList)
  }
}

推荐答案

Spark Datasets 需要 Encoders 用于将要存储的数据类型.对于常见类型(原子、产品类型)，有许多预定义的编码器可用，但您必须先从 SparkSession.implicits 使其工作:

Spark Datasets require Encoders for data type which is about to be stored. For common types (atomics, product types) there is a number of predefined encoders available but you have to import these first from SparkSession.implicits to make it work:

val sparkSession: SparkSession = ???
import sparkSession.implicits._
val dataset = sparkSession.createDataset(dataList)

或者你可以直接提供一个明确的

Alternatively you can provide directly an explicit

import org.apache.spark.sql.{Encoder, Encoders}

val dataset = sparkSession.createDataset(dataList)(Encoders.product[SimpleTuple])

或隐式

implicit val enc: Encoder[SimpleTuple] = Encoders.product[SimpleTuple]
val dataset = sparkSession.createDataset(dataList)

Encoder 用于存储类型.

请注意，Encoders 还为原子类型提供了许多预定义的 Encoders，为复杂类型提供了 Encoders，可以通过 ExpressionEncoder.

Note that Encoders also provide a number of predefined Encoders for atomic types, and Encoders for complex ones, can derived with ExpressionEncoder.

进一步阅读:

对于内置编码器未涵盖的自定义对象，请参阅如何在数据集中存储自定义对象?立>
对于 Row 对象，您必须明确提供 Encoder，如 Encoder 所示尝试将数据帧行映射到更新行时出错
对于调试案例，案例类必须在 Main https://stackoverflow.com/a/34715827/3535853

For custom objects which are not covered by built-in encoders see How to store custom objects in Dataset?
For Row objects you have to provide Encoder explicitly as shown in Encoder error while trying to map dataframe row to updated row
For debug cases, case class must be defined outside of the Main https://stackoverflow.com/a/34715827/3535853

这篇关于为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时? [英] Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时? [英] Why is &quot;Unable to find encoder for type stored in a Dataset&quot; when creating a dataset of custom case class?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时? [英] Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

登录关闭