动态构建案例类或架构 [英] Dynamically build case class or schema
问题描述
给出一个字符串列表,有没有一种方法可以创建case类或Schema,而无需手动输入小数点.
Given a list of strings, is there a way to create a case class or a Schema without inputing the srings manually.
对于eaxample,我有一个列表,
For eaxample, I have a List,
val name_list=Seq("Bob", "Mike", "Tim")
列表并不总是相同的.有时它会包含不同的名称,并且大小会有所不同.
The List will not always be the same. Sometimes it will contain different names and will vary in size.
我可以创建一个案例类
case class names(Bob:Integer, Mike:Integer, Time:Integer)
或架构
val schema = StructType(StructFiel("Bob", IntegerType,true)::
StructFiel("Mike", IntegerType,true)::
StructFiel("Tim", IntegerType,true)::Nil)
但是我必须手动进行.我正在寻找一种动态执行此操作的方法.
but I have to do it manually. I am looking for a method to perform this operation dynamically.
推荐答案
假定列的数据类型相同:
Assuming the data type of the columns are the same:
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val nameList=Seq("Bob", "Mike", "Tim")
val schema = StructType(nameList.map(n => StructField(n, IntegerType, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
// StructField(Bob,IntegerType,true), StructField(Mike,IntegerType,true), StructField(Tim,IntegerType,true)
// )
spark.createDataFrame(rdd, schema)
如果数据类型不同,则还必须提供它们(在这种情况下,与手动组装模式相比,它可能不会节省很多时间):
If the data types are different, you'll have to provide them as well (in which case it might not save much time compared with assembling the schema manually):
val typeList = Array[DataType](StringType, IntegerType, DoubleType)
val colSpec = nameList zip typeList
val schema = StructType(colSpec.map(cs => StructField(cs._1, cs._2, true)))
// schema: org.apache.spark.sql.types.StructType = StructType(
// StructField(Bob,StringType,true), StructField(Mike,IntegerType,true), StructField(Tim,DoubleType,true)
// )
这篇关于动态构建案例类或架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!