如何处理Spark中缺少的嵌套字段? [英] How to handle missing nested fields in spark?

查看:253
本文介绍了如何处理Spark中缺少的嵌套字段?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出两个案例类:

case class Response(
  responseField: String
  ...
  items: List[Item])

case class Item(
  itemField: String
  ...)

我正在创建一个Response数据集:

I am creating a Response dataset:

val dataset = spark.read.format("parquet")
                .load(inputPath)
                .as[Response]
                .map(x => x)

当任何行中都不存在itemField并且spark将引发此错误org.apache.spark.sql.AnalysisException: No such struct field itemField时,就会出现此问题.如果itemField没有嵌套,我可以通过执行dataset.withColumn("itemField", lit(""))来处理它.是否可以在List字段中执行相同的操作?

The issue arises when itemField is not present in any of the rows and spark will raise this error org.apache.spark.sql.AnalysisException: No such struct field itemField. If itemField was not nested I could handle it by doing dataset.withColumn("itemField", lit("")). Is it possible to do the same within the List field?

推荐答案

我假设以下内容:

数据使用以下架构编写:

Data was written with the following schema:

case class Item(itemField: String)
case class Response(responseField: String, items: List[Item])
Seq(Response("a", List()), Response("b", List())).toDF.write.parquet("/tmp/structTest")

现在架构已更改为:

case class Item(itemField: String, newField: Int)
case class Response(responseField: String, items: List[Item])
spark.read.parquet("/tmp/structTest").as[Response].map(x => x) // Fails

对于Spark 2.4,请参阅: Spark-如何向数组添加元素结构

For Spark 2.4 please see: Spark - How to add an element to an array of structs

对于Spark 2.3,这应该可以工作:

For Spark 2.3 this should work:

val addNewField: (Array[String], Array[Int]) => Array[Item] = (itemFields, newFields) => itemFields.zip(newFields).map { case (i, n) => Item(i, n) }

val addNewFieldUdf = udf(addNewField)
spark.read.parquet("/tmp/structTest")
   .withColumn("items", addNewFieldUdf(
      col("items.itemField") as "itemField", 
      array(lit(1)) as "newField"
   )).as[Response].map(x => x) // Works

这篇关于如何处理Spark中缺少的嵌套字段?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆