Spark:创建一个嵌套模式 [英] Spark: create a nested schema

查看:73
本文介绍了Spark:创建一个嵌套模式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有火花,

import spark.implicits._
val data = Seq(
  (1, ("value11", "value12")),
  (2, ("value21", "value22")),
  (3, ("value31", "value32"))
  )

 val df = data.toDF("id", "v1")
 df.printSchema()

结果如下:

root
|-- id: integer (nullable = false)
|-- v1: struct (nullable = true)
|    |-- _1: string (nullable = true)
|    |-- _2: string (nullable = true)

现在,如果我想自己创建模式,应该如何处理?

Now if I want to create the schema myself, how should I process?

val schema = StructType(Array(
  StructField("id", IntegerType),
  StructField("nested", ???)
))

谢谢.

推荐答案

根据此处的示例: https://spark.apache.org/docs/2.4.0/api/java/org/apache/spark/sql/types/StructType.html

 import org.apache.spark.sql._
 import org.apache.spark.sql.types._

 val innerStruct =
   StructType(
     StructField("f1", IntegerType, true) ::
     StructField("f2", LongType, false) ::
     StructField("f3", BooleanType, false) :: Nil)

 val struct = StructType(
   StructField("a", innerStruct, true) :: Nil)

 // Create a Row with the schema defined by struct
 val row = Row(Row(1, 2, true))

根据您的情况,它将是:

And in your case it will be:

import org.apache.spark.sql._
import org.apache.spark.sql.types._

val schema = StructType(Array(
  StructField("id", IntegerType),
  StructField("nested", StructType(Array(
      StructField("value1", StringType),
      StructField("value2", StringType)
  )))
))

输出:

StructType(
  StructField(id,IntegerType,true), 
  StructField(nested,StructType(
    StructField(value1,StringType,true), 
    StructField(value2,StringType,true)
  ),true)
)

这篇关于Spark:创建一个嵌套模式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆