为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时? [英] Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

查看：124 发布时间：2020/9/3 23:14:29 scala apache-spark apache-spark-dataset apache-spark-encoders

本文介绍了为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Spark 2.0(最终版)和Scala 2.11.8.以下超级简单代码产生编译错误Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

Spark 2.0 (final) with Scala 2.11.8. The following super simple code yields the compilation error Error:(17, 45) Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.

import org.apache.spark.sql.SparkSession

case class SimpleTuple(id: Int, desc: String)

object DatasetTest {
  val dataList = List(
    SimpleTuple(5, "abc"),
    SimpleTuple(6, "bcd")
  )

  def main(args: Array[String]): Unit = {
    val sparkSession = SparkSession.builder.
      master("local")
      .appName("example")
      .getOrCreate()

    val dataset = sparkSession.createDataset(dataList)
  }
}

推荐答案

Spark Datasets要求Encoders表示要存储的数据类型.对于常见类型(原子，产品类型)，有许多可用的预定义编码器，但是您必须首先从

Spark Datasets require Encoders for data type which is about to be stored. For common types (atomics, product types) there is a number of predefined encoders available but you have to import these first from SparkSession.implicits to make it work:

val sparkSession: SparkSession = ???
import sparkSession.implicits._
val dataset = sparkSession.createDataset(dataList)

或者，您可以直接提供一个明确的

Alternatively you can provide directly an explicit

import org.apache.spark.sql.{Encoder, Encoders}

val dataset = sparkSession.createDataset(dataList)(Encoders.product[SimpleTuple])

或隐式

implicit val enc: Encoder[SimpleTuple] = Encoders.product[SimpleTuple]
val dataset = sparkSession.createDataset(dataList)

Encoder表示存储的类型.

请注意，Encoders还为原子类型提供了许多预定义的Encoders，而对于复杂类型也提供了Encoders，可以使用

Note that Encoders also provide a number of predefined Encoders for atomic types, and Encoders for complex ones, can derived with ExpressionEncoder.

进一步阅读:

对于内置编码器未涵盖的自定义对象，请参见如何在数据集中存储自定义对象?
对于Row对象，您必须显式提供Encoder，如尝试将数据框行映射到更新的行时出现的编码器错误
对于调试用例，必须在Main https://stackoverflow.com/a/34715827/3535853

For custom objects which are not covered by built-in encoders see How to store custom objects in Dataset?
For Row objects you have to provide Encoder explicitly as shown in Encoder error while trying to map dataframe row to updated row
For debug cases, case class must be defined outside of the Main https://stackoverflow.com/a/34715827/3535853

这篇关于为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时? [英] Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时? [英] Why is &quot;Unable to find encoder for type stored in a Dataset&quot; when creating a dataset of custom case class?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

为什么“无法找到存储在数据集中的类型的编码器"?创建自定义案例类的数据集时? [英] Why is "Unable to find encoder for type stored in a Dataset" when creating a dataset of custom case class?

登录关闭