重命名 spark 数据框 structType 字段 [英] rename spark dataframe structType fields
本文介绍了重命名 spark 数据框 structType 字段的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
给定一个动态的 structType .此处 structType 名称未知.它是动态的,因此它的名字正在改变.
Given a dynamic structType . here structType name is not known . It is dynamic and hence its name is changing.
名称是可变的.所以不要预先假设MAIN_COL"在架构中.
The name is variable . So don't pre assume "MAIN_COL" in the schema.
root
|-- MAIN_COL: struct (nullable = true)
| |-- a: string (nullable = true)
| |-- b: string (nullable = true)
| |-- c: string (nullable = true)
| |-- d: string (nullable = true)
| |-- f: long (nullable = true)
| |-- g: long (nullable = true)
| |-- h: long (nullable = true)
| |-- j: long (nullable = true)
我们如何编写动态代码来重命名以名称为前缀的 structType 字段.
how can we write a dynamic code to rename the fields of a structType with its name as its prefix.
root
|-- MAIN_COL: struct (nullable = true)
| |-- MAIN_COL_a: string (nullable = true)
| |-- MAIN_COL_b: string (nullable = true)
| |-- MAIN_COL_c: string (nullable = true)
| |-- MAIN_COL_d: string (nullable = true)
| |-- MAIN_COL_f: long (nullable = true)
| |-- MAIN_COL_g: long (nullable = true)
| |-- MAIN_COL_h: long (nullable = true)
| |-- MAIN_COL_j: long (nullable = true)
推荐答案
您可以使用 DSL 更新嵌套列的架构.
You can use DSL to update the schema of nested columns.
import org.apache.spark.sql.types._
val schema: StructType = df.schema.fields.head.dataType.asInstanceOf[StructType]
val updatedSchema = StructType.apply(
schema.fields.map(sf => StructField.apply("MAIN_COL_" + sf.name, sf.dataType))
)
val resultDF = df.withColumn("MAIN_COL", $"MAIN_COL".cast(updatedSchema))
更新架构:
root
|-- MAIN_COL: struct (nullable = false)
| |-- MAIN_COL_a: string (nullable = true)
| |-- MAIN_COL_b: string (nullable = true)
| |-- MAIN_COL_c: string (nullable = true)
这篇关于重命名 spark 数据框 structType 字段的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文