在SPARK SCALA中按名称获取行的类型结构的元素 [英] Get elements of type structure of row by name in SPARK SCALA
问题描述
在Apache Spark的DataFrame对象中(我正在使用Scala接口),如果要遍历其Row对象,是否可以通过名称提取结构值?
In a DataFrame object in Apache Spark (I'm using the Scala interface), if I'm iterating over its Row objects, is there any way to extract structure values by name?
我正在使用下面的代码按名称提取,但是我在如何读取struct值时遇到了问题.
I am using the below code to extract by name but I am facing problem on how to read the struct value .
如果值的类型为字符串,那么我们可以这样做:
If values had been of type string then we could have done this:
val resultDF=joinedDF.rdd.map{row=>
val id=row.getAs[Long]("id")
val values=row.getAs[String]("slotSize")
val feilds=row.getAs[String](values)
(id,values,feilds)
}.toDF("id","values","feilds")
但是在我的情况下,值具有以下架构
But in my case values has the below schema
v1: struct (nullable = true)
| |-- level1: string (nullable = true)
| |-- level2: string (nullable = true)
| |-- level3: string (nullable = true)
| |-- level4: string (nullable = true)
| |-- level5: string (nullable = true)
鉴于值具有上述结构,我应该用什么替换此行以使代码正常工作.
What shall I replace this line with to make the code work given that value has the above structure.
row.getAs[String](values)
推荐答案
您可以访问struct
元素,这是我第一次从顶层Row
提取另一个Row
(结构建模为Spark中的另一个Row
)像这样:
You can access the struct
elements my first extracting another Row
(structs are modeled as another Row
in spark) from the toplevel Row
like this:
Scala实施
val level1 = row.getAs[Row]("struct").getAs[String]("level1")
Java实现
String level1 = f.<Row>getAs("struct).getAs("level1").toString();
这篇关于在SPARK SCALA中按名称获取行的类型结构的元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!