在Spark数据框中修改结构列 [英] Modify a struct column in spark dataframe
本文介绍了在Spark数据框中修改结构列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个pyspark数据框,其中包含学生"列,如下所示:
I have a pyspark dataframe which contains a column "student" as follows:
"student" : {
"name" : "kaleem",
"rollno" : "12"
}
在数据框中的架构为:
structType(List(
name: String,
rollno: String))
我需要将此列修改为
"student" : {
"student_details" : {
"name" : "kaleem",
"rollno" : "12"
}
}
数据框中的架构必须为:
Schema for this in dataframe must be :
structType(List(
student_details:
structType(List(
name: String,
rollno: String))
))
如何在火花中做到这一点?
How to do this in spark?
推荐答案
Use named_struct function to achieve this-
val data =
"""
| {
| "student": {
| "name": "kaleem",
| "rollno": "12"
| }
|}
""".stripMargin
val df = spark.read.json(Seq(data).toDS())
df.show(false)
println(df.schema("student"))
输出-
+------------+
|student |
+------------+
|[kaleem, 12]|
+------------+
StructField(student,StructType(StructField(name,StringType,true), StructField(rollno,StringType,true)),true)
2.使用named_struct
更改架构
2. change the schema using named_struct
val processedDf = df.withColumn("student",
expr("named_struct('student_details', student)")
)
processedDf.show(false)
println(processedDf.schema("student"))
输出-
+--------------+
|student |
+--------------+
|[[kaleem, 12]]|
+--------------+
StructField(student,StructType(StructField(student_details,StructType(StructField(name,StringType,true), StructField(rollno,StringType,true)),true)),false)
对于python
step#2
来说,就像删除val一样工作
For python
step#2
will work as is just remove val
这篇关于在Spark数据框中修改结构列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文