如何在数据集中使用java.time.LocalDate(与java.lang.UnsupportedOperationException失败:未找到编码器)? [英] How to use java.time.LocalDate in Datasets (fails with java.lang.UnsupportedOperationException: No Encoder found)?
问题描述
- 火花2.1.1
- Scala 2.11.8
- Java 8
- Linux Ubuntu 16.04 LTS
我想将我的RDD转换为数据集.为此,我使用implicits
方法toDS()
给了我以下错误:
I'd like to transform my RDD into a Dataset. For this, I use the implicits
method toDS()
that give me the following error:
Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for java.time.LocalDate
- field (class: "java.time.LocalDate", name: "date")
- root class: "observatory.TemperatureRow"
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:602)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:596)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$9.apply(ScalaReflection.scala:587)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
就我而言,我必须使用类型java.time.LocalDate
,而不能使用java.sql.data
.我已经读到我需要告知Spark如何将Java类型转换为Sql类型,在这个方向上,我构建了以下2个隐式函数:
In my case, I must use the type java.time.LocalDate
, I can't use the java.sql.data
. I have read that I need to informe Spark how transforme Java type into Sql type, I this direction, I build the 2 implicits functions below:
implicit def toSerialized(t: TemperatureRow): EncodedTemperatureRow = EncodedTemperatureRow(t.date.toString, t.location, t.temperature)
implicit def fromSerialized(t: EncodedTemperatureRow): TemperatureRow = TemperatureRow(LocalDate.parse(t.date), t.location, t.temperature)
下面,关于我的应用程序的一些代码:
Below, some code about my application:
case class Location(lat: Double, lon: Double)
case class TemperatureRow(
date: LocalDate,
location: Location,
temperature: Double
)
case class EncodedTemperatureRow(
date: String,
location: Location,
temperature: Double
val s = Seq[TemperatureRow](
TemperatureRow(LocalDate.parse("2017-01-01"), Location(1.4,5.1), 4.9),
TemperatureRow(LocalDate.parse("2014-04-05"), Location(1.5,2.5), 5.5)
)
import spark.implicits._
val temps: RDD[TemperatureRow] = sc.parallelize(s)
val tempsDS = temps.toDS
我不知道为什么Spark为java.time.LocalDate
搜索编码器,我为TemperatureRow
和EncodedTemperatureRow
提供了隐式转换...
I don't know why Spark search an encoder for java.time.LocalDate
, I provide implicit conversions for TemperatureRow
and EncodedTemperatureRow
...
推荐答案
java.time.LocalDate
在Spark 2.2之前不受支持(并且一段时间以来,我一直在尝试为该类型编写Encoder
,而失败).
java.time.LocalDate
is not supported up to Spark 2.2 (and I've been trying to write an Encoder
for the type for some time and failed).
您必须将java.time.LocalDate
转换为其他受支持的类型(例如java.sql.Timestamp
或java.sql.Date
),或者字符串中的纪元或日期时间.
You have to convert java.time.LocalDate
to some other supported type (e.g. java.sql.Timestamp
or java.sql.Date
), or epoch or date-time in string.
这篇关于如何在数据集中使用java.time.LocalDate(与java.lang.UnsupportedOperationException失败:未找到编码器)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!