自Spark 2.X起,无法使用scala.None值创建org.apache.spark.sql.Row [英] Unable to create org.apache.spark.sql.Row with scala.None value since Spark 2.X
问题描述
由于Spark 2.X无法使用scala.None值创建org.apache.spark.sql.Row(Spark 1.6.X可能存在)
Caused by: java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.None$ is not a valid external type for schema of string
可复制的示例:
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
spark.createDataFrame(
sc.parallelize(Seq(Row(None))),
StructType(Seq(StructField("v", StringType, true)))
).first
要点: https://gist.github.com/AleksandrPavlenko/bef1c34458883730cc319b2e7378c8c6
好像已在 SPARK-15657 中进行了更改(不确定,仍在尝试证明这一点)
这是预期行为,如 SPARK-19056 (行编码器应接受可选类型):
这是故意的.从来没有记录允许在
Row
中使用Option
,并且在将编码器框架应用于所有类型的操作时会带来很多麻烦.从Spark 2.0开始,请对键入的操作/自定义对象使用Dataset
Since Spark 2.X unable to create org.apache.spark.sql.Row with scala.None value (it was possible for Spark 1.6.X)
Caused by: java.lang.RuntimeException: Error while encoding: java.lang.RuntimeException: scala.None$ is not a valid external type for schema of string
Reproducible example:
import org.apache.spark.sql.types._
import org.apache.spark.sql.Row
spark.createDataFrame(
sc.parallelize(Seq(Row(None))),
StructType(Seq(StructField("v", StringType, true)))
).first
Gist: https://gist.github.com/AleksandrPavlenko/bef1c34458883730cc319b2e7378c8c6
Looks like it was changed in SPARK-15657 (not sure, still trying to prove it)
This is an expected behavior as described in SPARK-19056 (Row encoder should accept optional types):
This is intentional. Allowing
Option
inRow
is never documented and brings a lot of troubles when we apply the encoder framework to all typed operations. Since Spark 2.0, please useDataset
for typed operation/custom objects
这篇关于自Spark 2.X起,无法使用scala.None值创建org.apache.spark.sql.Row的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!