Spark:分解结构的数据框数组并附加ID [英] Spark: Explode a dataframe array of structs and append id
本文介绍了Spark:分解结构的数据框数组并附加ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我目前有一个带有ID和一个列的数据框,该列是结构数组:
I currently have a dataframe with an id and a column which is an array of structs:
root
|-- id: integer (nullable = true)
|-- lists: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- text: string (nullable = true)
| | |-- amount: double (nullable = true)
这是带有数据的示例表:
Here is an example table with data:
id | lists
-----------
1 | [[a, 1.0], [b, 2.0]]
2 | [[c, 3.0]]
如何将上面的数据框转换为下面的数据框?我需要分解"数组并同时附加ID.
How do I transform the above dataframe to the one below? I need to "explode" the array and append the id at the same time.
id | col1 | col2
-----------------
1 | a | 1.0
1 | b | 2.0
2 | c | 3.0
修改后的注释:
请注意,以下两个示例之间存在差异.第一个包含元素结构数组" .而后者仅包含元素数组" .
Note there is a difference between the two examples below. The first one contains "an array of structs of elements". While the later just contains "an array of elements".
root
|-- id: integer (nullable = true)
|-- lists: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- text: string (nullable = true)
| | |-- amount: double (nullable = true)
root
|-- a: long (nullable = true)
|-- b: array (nullable = true)
| |-- element: long (containsNull = true)
推荐答案
explode
正是该功能:
import org.apache.spark.sql.functions._
df.select($"id", explode($"lists")).select($"id", $"col.text", $"col.amount")
这篇关于Spark:分解结构的数据框数组并附加ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文