从JSON模式表示形式创建Spark数据帧模式 [英] Create spark dataframe schema from json schema representation
问题描述
是否有一种方法可以将数据帧架构序列化为json并在以后进行反序列化?
Is there a way to serialize a dataframe schema to json and deserialize it later on?
用例很简单: 我有一个json配置文件,其中包含我需要读取的数据帧的架构. 我希望能够从现有模式(在数据帧中)创建默认配置,并且希望能够通过从json字符串中读取它来生成相关模式,以供以后使用.
The use case is simple: I have a json configuration file which contains the schema for dataframes I need to read. I want to be able to create the default configuration from an existing schema (in a dataframe) and I want to be able to generate the relevant schema to be used later on by reading it from the json string.
推荐答案
有两个步骤:从现有数据帧创建json,并从先前保存的json字符串创建模式.
There are two steps for this: Creating the json from an existing dataframe and creating the schema from the previously saved json string.
从现有数据框中创建字符串
val schema = df.schema
val jsonString = schema.json
从json创建模式
import org.apache.spark.sql.types.{DataType, StructType}
val newSchema = DataType.fromJson(jsonString).asInstanceOf[StructType]
这篇关于从JSON模式表示形式创建Spark数据帧模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!