动态构建案例类或架构 [英] Dynamically build case class or schema

查看:140
本文介绍了动态构建案例类或架构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出一个字符串列表,有没有一种方法可以创建case类或Schema,而无需手动输入小数点.

Given a list of strings, is there a way to create a case class or a Schema without inputing the srings manually.

对于eaxample,我有一个列表,

For eaxample, I have a List,

 val name_list=Seq("Bob", "Mike", "Tim")

列表并不总是相同的.有时它会包含不同的名称,并且大小会有所不同.

The List will not always be the same. Sometimes it will contain different names and will vary in size.

我可以创建一个案例类

case class names(Bob:Integer, Mike:Integer, Time:Integer)

或架构

 val schema = StructType(StructFiel("Bob", IntegerType,true)::
            StructFiel("Mike", IntegerType,true)::
            StructFiel("Tim", IntegerType,true)::Nil)

但是我必须手动进行.我正在寻找一种动态执行此操作的方法.

but I have to do it manually. I am looking for a method to perform this operation dynamically.

推荐答案

假定列的数据类型相同:

Assuming the data type of the columns are the same:

import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._

val nameList=Seq("Bob", "Mike", "Tim")

val schema = StructType(nameList.map(n => StructField(n, IntegerType, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
//   StructField(Bob,IntegerType,true), StructField(Mike,IntegerType,true), StructField(Tim,IntegerType,true)
// )

spark.createDataFrame(rdd, schema)

如果数据类型不同,则还必须提供它们(在这种情况下,与手动组装模式相比,它可能不会节省很多时间):

If the data types are different, you'll have to provide them as well (in which case it might not save much time compared with assembling the schema manually):

val typeList = Array[DataType](StringType, IntegerType, DoubleType)
val colSpec = nameList zip typeList

val schema = StructType(colSpec.map(cs => StructField(cs._1, cs._2, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
//   StructField(Bob,StringType,true), StructField(Mike,IntegerType,true), StructField(Tim,DoubleType,true)
// )

这篇关于动态构建案例类或架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆