使用已知架构保存空的DataFrame（Spark 2.2.1） [英] Saving empty DataFrame with known schema (Spark 2.2.1)

查看：83 发布时间：2020/10/16 19:55:46 apache-spark parquet databricks

本文介绍了使用已知架构保存空的DataFrame（Spark 2.2.1）的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

是否可以保存具有已知模式的空DataFrame，以便即使该模式具有0条记录，也可以将该模式写入文件？

Is it possible to save an empty DataFrame with a known schema such that the schema is written to the file, even though it has 0 records?

def example(spark: SparkSession, path: String, schema: StructType) = { 
  val dataframe = spark.createDataFrame(spark.sparkContext.emptyRDD[Row], schema) 
  val dataframeWriter = dataframe.write.mode(SaveMode.Overwrite).format("parquet") 
  dataframeWriter.save(path) 

  spark.read.load(path) // ERROR!! No files to read, so schema unknown 
}

推荐答案

这是我从Databricks支持人员那里得到的答案：

This is the answer I received from Databricks Support:

这实际上是Spark中的一个已知问题。在
开源JIRA中已经完成了修复-> https://issues.apache.org/jira/browse/SPARK-23271。
有关此行为如何从2.4更改的更多详细信息，请
检查此文档更改
https://github.com/apache/spark/pull/20525/files#diff-d8aa7a37d17a1227cba38c99f9f22511R1808
来自行为2.4。在此之前，您需要使用以下任意一种方式进入

This is actually a known issue in Spark. There is already fix done in opensource JIRA -> https://issues.apache.org/jira/browse/SPARK-23271. For more details on how this behavior will change from 2.4 please check this doc change https://github.com/apache/spark/pull/20525/files#diff-d8aa7a37d17a1227cba38c99f9f22511R1808 The behavior will be changed from Spark 2.4. Until then you need to go with any one of the following ways

保存至少包含一条记录的数据框以保留其模式
将模式保存在JSON文件中，以后再使用

这篇关于使用已知架构保存空的DataFrame（Spark 2.2.1）的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用已知架构保存空的DataFrame（Spark 2.2.1） [英] Saving empty DataFrame with known schema (Spark 2.2.1)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用已知架构保存空的DataFrame（Spark 2.2.1） [英] Saving empty DataFrame with known schema (Spark 2.2.1)

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭