Spark数据框-将struct列分为2列 [英] Spark dataframe - Split struct column into 2 columns

查看:400
本文介绍了Spark数据框-将struct列分为2列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧,其中包含(我认为是什么)几个(String, String).

I have a data frame containing (what I think are) couples of (String, String).

它看起来像这样:

> df.show
| Col1 | Col2    |
| A    | [k1, v1]|
| A    | [k2, v2]|

> df.printSchema
|-- _1: string (nullable = true)
|-- _2: struct (nullable = true)
|    |-- _1: string (nullable = true)
|    |-- _2: string (nullable = true)

Col2曾经包含一个Map[String, String],我在上面进行了toList(),然后explode()来获取原始Map中存在的每个映射的一行.

Col2 used to contain a Map[String, String] on which I have done a toList() and then explode() to obtain one row per mapping present in the original Map.


我想将Col2分为两列,并获取此数据帧:

I would like to split Col2 into 2 columns and obtain this dataframe:

| Col1 | key    | value |
| A    | k1     | v1    |
| A    | k2     | v2    |

有人知道该怎么做吗?

或者,是否有人知道如何将地图爆炸+拆分成多行(每个映射一个)和两列(一个键,一个值).


我尝试将通常成功的模式与(String, String)一起使用,但这不起作用:

I tried using the usually successful pattern with (String, String) but this does not work:

df.select("Col1", "Col2").
   map(r =>(r(0).asInstanceOf[String],
            r(1).asInstanceOf[(String, String)](0),
            r(1).asInstanceOf[(String, String)](1)
           )
       )

Caused by: java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2

==>我猜Col2的类型是org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema,为此无法找到spark/scala文档.

==> I guess the type of Col2 is org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema, could not find spark / scala doc for this.

即使那行得通,也会出现一个问题,即使用索引不是访问元组元素的正确方法...

And even if that worked, there would then be the issue that using indexes is not the right way to access elements of a tuple...

谢谢!

推荐答案

您可以使用select投影struct的每个元素以对其进行解压缩.

You can use select to project each element of struct to unpack it.

df.select($"Col1", $"Col2._1".as("key"), $"Col2._2".as("value"))

这篇关于Spark数据框-将struct列分为2列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆