Spark 数据框 - 将结构列拆分为 2 列 [英] Spark dataframe - Split struct column into 2 columns

查看：66 发布时间：2021/11/14 22:35:40 apache-spark spark-dataframe

本文介绍了Spark 数据框 - 将结构列拆分为 2 列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含 (我认为是) 对 (String, String) 的数据框.

I have a data frame containing (what I think are) couples of (String, String).

看起来像这样:

> df.show
| Col1 | Col2    |
| A    | [k1, v1]|
| A    | [k2, v2]|

> df.printSchema
|-- _1: string (nullable = true)
|-- _2: struct (nullable = true)
|    |-- _1: string (nullable = true)
|    |-- _2: string (nullable = true)

Col2 曾经包含一个 Map[String, String]，我在上面做了一个 toList() 然后 explode() 获取原始 Map 中存在的每个映射的一行.

Col2 used to contain a Map[String, String] on which I have done a toList() and then explode() to obtain one row per mapping present in the original Map.

我想将 Col2 分成 2 列并获取此数据框:

I would like to split Col2 into 2 columns and obtain this dataframe:

| Col1 | key    | value |
| A    | k1     | v1    |
| A    | k2     | v2    |

有人知道怎么做吗?

或者，有谁知道如何将地图分解+拆分为多行(每个映射一个)和 2 列(一个用于键，一个用于值).

我尝试将通常成功的模式与 (String, String) 一起使用，但这不起作用:

I tried using the usually successful pattern with (String, String) but this does not work:

df.select("Col1", "Col2").
   map(r =>(r(0).asInstanceOf[String],
            r(1).asInstanceOf[(String, String)](0),
            r(1).asInstanceOf[(String, String)](1)
           )
       )

Caused by: java.lang.ClassCastException:
org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema cannot be cast to scala.Tuple2

==> 我猜 Col2 的类型是 org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema，为此找不到 spark/scala 文档.

==> I guess the type of Col2 is org.apache.spark.sql.catalyst.expressions.GenericRowWithSchema, could not find spark / scala doc for this.

即使这样做有效，也会存在使用索引不是访问元组元素的正确方法的问题...

And even if that worked, there would then be the issue that using indexes is not the right way to access elements of a tuple...

谢谢！

Spark 数据框 - 将结构列拆分为 2 列 [英] Spark dataframe - Split struct column into 2 columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 数据框 - 将结构列拆分为 2 列 [英] Spark dataframe - Split struct column into 2 columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭