spark数据框:爆炸列表列 [英] spark dataframe: explode list column

查看：55 发布时间：2021/4/8 19:41:59 apache-spark apache-spark-sql

本文介绍了spark数据框:爆炸列表列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我从Spark Aggregator获得了一个输出，该输出是 List [Character]

I've got an output from Spark Aggregator which is List[Character]

case class Character(name: String, secondName: String, faculty: String)
val charColumn = HPAggregator.toColumn
val resultDF = someDF.select(charColumn)

所以我的数据框看起来像:

So my dataframe looks like:

+-----------------------------------------------+
|               value                           |
+-----------------------------------------------+
|[[harry, potter, gryffindor],[ron, weasley ... |
+-----------------------------------------------+

现在我想将其转换为

+----------------------------------+
| name  | second_name | faculty    |
+----------------------------------+
| harry | potter      | gryffindor |
| ron   | weasley     | gryffindor |

我该怎么做呢?

推荐答案

这可以使用爆炸和拆分数据框函数来完成.

This can be done using Explode and Split Dataframe functions.

下面是一个示例:

>>> df = spark.createDataFrame([[[['a','b','c'], ['d','e','f'], ['g','h','i']]]],["col1"])
>>> df.show(20, False)
+---------------------------------------------------------------------+
|col1                                                                 |
+---------------------------------------------------------------------+
|[WrappedArray(a, b, c), WrappedArray(d, e, f), WrappedArray(g, h, i)]|
+---------------------------------------------------------------------+

>>> from pyspark.sql.functions import explode
>>> out_df = df.withColumn("col2", explode(df.col1)).drop('col1')
>>>
>>> out_df .show()
+---------+
|     col2|
+---------+
|[a, b, c]|
|[d, e, f]|
|[g, h, i]|
+---------+

>>> out_df.select(out_df.col2[0].alias('c1'), out_df.col2[1].alias('c2'), out_df.col2[2].alias('c3')).show()
+---+---+---+
| c1| c2| c3|
+---+---+---+
|  a|  b|  c|
|  d|  e|  f|
|  g|  h|  i|
+---+---+---+

>>>

这篇关于spark数据框:爆炸列表列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

spark数据框:爆炸列表列 [英] spark dataframe: explode list column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

spark数据框:爆炸列表列 [英] spark dataframe: explode list column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭