获取数据帧架构加载到元数据表 [英] Get dataframe schema load to metadata table

查看:73
本文介绍了获取数据帧架构加载到元数据表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

用例是读取一个文件并在其之上创建一个数据框,然后获取该文件的架构并存储到数据库表中.

Use case is to read a file and create a dataframe on top of it.After that get the schema of that file and store into a DB table.

出于示例目的,我只是创建一个case类并获取printschema,但是无法从中创建数据框

For example purpose I am just creating a case class and getting the printschema however I am unable create a dataframe out of it

这是示例代码

case class Employee(Name:String, Age:Int, Designation:String, Salary:Int, ZipCode:Int)

val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.master", "local")
.getOrCreate()

import spark.implicits._
val EmployeesData = Seq( Employee("Anto",   21, "Software Engineer", 2000, 56798))
val Employee_DataFrame = EmployeesData.toDF
val dfschema = Employee_DataFrame.schema

现在dfschema是一种structype,想将其转换为两列的数据帧,如何实现该目标

Now dfschema is a structype and wanted to convert it in a dataframe of two columns , how to achieve that

推荐答案

火花> = 2.4.0

为了将架构保存为字符串格式,可以使用StructTypetoDDL方法.在您的情况下,DDL格式应为:

In order to save the schema into a string format you can use the toDDL method of the StructType. In your case the DDL format should be:

`Name` STRING, `Age` INT, `Designation` STRING, `Salary` INT, `ZipCode` INT

保存架构后,您可以从数据库中加载它并将其用作StructType.fromDDL(my_schema),这将返回StructType的实例,您可以使用它来创建带有spark.createDataFrame的新数据框,如已经提到的@Ajay.

After saving the schema you can load it from the database and use it as StructType.fromDDL(my_schema) this will return an instance of StructType which you can use to create the new dataframe with spark.createDataFrame as @Ajay already mentioned.

记住要始终提取为模式指定案例类:

Also is useful to remember that you can always extract the schema given a case class with:

import org.apache.spark.sql.catalyst.ScalaReflection
val empSchema = ScalaReflection.schemaFor[Employee].dataType.asInstanceOf[StructType]

然后您可以使用empSchema.toDDL获取DDL表示形式.

And then you can get the DDL representation with empSchema.toDDL.

火花< 2.4

对于Spark< 2.4相应地使用DataType.fromDDLschema.simpleString.另外,除了返回StructType之外,还应使用DataType实例,省略掉对StructType的强制转换:

For Spark < 2.4 use DataType.fromDDL and schema.simpleString accordingly. Also instead of returning a StructType you should use an DataType instance omitting the cast to StructType as next:

val empSchema = ScalaReflection.schemaFor[Employee].dataType

empSchema.simpleString的示例输出:

Sample output for empSchema.simpleString:

struct<Name:string,Age:int,Designation:string,Salary:int,ZipCode:int>

这篇关于获取数据帧架构加载到元数据表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆