使用已知架构保存空的DataFrame(Spark 2.2.1) [英] Saving empty DataFrame with known schema (Spark 2.2.1)

查看:83
本文介绍了使用已知架构保存空的DataFrame(Spark 2.2.1)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否可以保存具有已知模式的空DataFrame,以便即使该模式具有0条记录,也可以将该模式写入文件?

Is it possible to save an empty DataFrame with a known schema such that the schema is written to the file, even though it has 0 records?

def example(spark: SparkSession, path: String, schema: StructType) = { 
  val dataframe = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema) 
  val dataframeWriter = dataframe.write.mode(SaveMode.Overwrite).format("parquet") 
  dataframeWriter.save(path) 

  spark.read.load(path) // ERROR!! No files to read, so schema unknown 
} 


推荐答案

这是我从Databricks支持人员那里得到的答案:

This is the answer I received from Databricks Support:


这实际上是Spark中的一个已知问题。在
开源JIRA中已经完成了修复-> https://issues.apache.org/jira/browse/SPARK-23271
有关此行为如何从2.4更改的更多详细信息,请
检查此文档更改
https://github.com/apache/spark/pull/20525/files#diff-d8aa7a37d17a​​1227cba38c99f9f22511R1808
来自行为2.4。在此之前,您需要使用以下任意一种方式进入

This is actually a known issue in Spark. There is already fix done in opensource JIRA -> https://issues.apache.org/jira/browse/SPARK-23271. For more details on how this behavior will change from 2.4 please check this doc change https://github.com/apache/spark/pull/20525/files#diff-d8aa7a37d17a1227cba38c99f9f22511R1808 The behavior will be changed from Spark 2.4. Until then you need to go with any one of the following ways


  1. 保存至少包含一条记录的数据框以保留其模式
  2. 将模式保存在JSON文件中,以后再使用


这篇关于使用已知架构保存空的DataFrame(Spark 2.2.1)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆