将结构数组分解为Spark中的列 [英] Explode array of structs to columns in Spark

查看:116
本文介绍了将结构数组分解为Spark中的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将结构数组分解为列(由struct字段定义)。例如,

I'd like to explode an array of structs to columns (as defined by the struct fields). E.g.

root
 |-- arr: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- id: long (nullable = false)
 |    |    |-- name: string (nullable = true)

应转换为

root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)

我可以做到

df
  .select(explode($"arr").as("tmp"))
  .select($"tmp.*")

如何在单个选择语句中做到这一点?

How can I do that in a single select statement?

I认为这样做可以,但是不幸的是,它不起作用:

I thought this could work, unfortunately it does not:

df.select(explode($"arr")(".*"))




线程 main org.apache.spark中的异常。 sql.AnalysisException:在col中没有
这样的结构字段。

Exception in thread "main" org.apache.spark.sql.AnalysisException: No such struct field .* in col;


推荐答案

单步解决方案仅适用于 MapType 列:

Single step solution is available only for MapType columns:

val df = Seq(Tuple1(Map((1L, "bar"), (2L, "foo")))).toDF

df.select(explode($"_1") as Seq("foo", "bar")).show

+---+---+
|foo|bar|
+---+---+
|  1|bar|
|  2|foo|
+---+---+

对于数组,您可以使用 flatMap

With arrays you can use flatMap:

val df = Seq(Tuple1(Array((1L, "bar"), (2L, "foo")))).toDF
df.as[Seq[(Long, String)]].flatMap(identity)

单个 SELECT 语句可以用SQL编写:

A single SELECT statement can written in SQL:

 df.createOrReplaceTempView("df")

spark.sql("SELECT x._1, x._2 FROM df LATERAL VIEW explode(_1) t AS x")

这篇关于将结构数组分解为Spark中的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆