访问struct Spark SQL中的字段名称 [英] Access names of fields in struct Spark SQL
问题描述
我试图将结构的字段提升"到数据框中的顶层,如以下示例所示:
I am trying to 'lift' the fields of a struct to the top level in a dataframe, as illustrated by this example:
case class A(a1: String, a2: String)
case class B(b1: String, b2: A)
val df = Seq(B("X",A("Y","Z"))).toDF
df.show
+---+-----+
| b1| b2|
+---+-----+
| X|[Y,Z]|
+---+-----+
df.printSchema
root
|-- b1: string (nullable = true)
|-- b2: struct (nullable = true)
| |-- a1: string (nullable = true)
| |-- a2: string (nullable = true)
val lifted = df.withColumn("a1", $"b2.a1").withColumn("a2", $"b2.a2").drop("b2")
lifted.show
+---+---+---+
| b1| a1| a2|
+---+---+---+
| X| Y| Z|
+---+---+---+
lifted.printSchema
root
|-- b1: string (nullable = true)
|-- a1: string (nullable = true)
|-- a2: string (nullable = true)
这有效.我想创建一个为我执行此操作的小实用程序方法,可能是通过拉皮拉DataFrame来启用df.lift("b2")之类的功能.
This works. I would like to create a little utility method which does this for me, probably through pimping DataFrame to enable something like df.lift("b2").
为此,我想我想要一种获取Struct中所有字段的列表的方法.例如.给定"b2"作为输入,返回["a1","a2"].我该怎么办?
To do this, I think I want a way of obtaining a list of all fields within a Struct. E.g. given "b2" as input, return ["a1","a2"]. How do I do this?
推荐答案
如果我正确理解了您的问题,则希望能够列出列b2的嵌套字段.
If I understand your question correctly, you want to be able to list the nested fields of column b2.
因此,您需要对b2
进行过滤,访问b2
的StructType
,然后从字段(StructField
)中映射列的名称:
So you would need to filter on b2
, access the StructType
of b2
and then map the names of the columns from within the fields (StructField
):
import org.apache.spark.sql.types.StructType
val nested_fields = df.schema
.filter(c => c.name == "b2")
.flatMap(_.dataType.asInstanceOf[StructType].fields)
.map(_.name)
// nested_fields: Seq[String] = List(a1, a2)
这篇关于访问struct Spark SQL中的字段名称的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!