从哈希图中创建一个数据帧，键作为列名，值作为 Spark 中的行 [英] Create a dataframe from a hashmap with keys as column names and values as rows in Spark

查看：14 发布时间：2021/11/14 23:05:02 scala apache-spark dataframe apache-spark-sql

本文介绍了从哈希图中创建一个数据帧，键作为列名，值作为 Spark 中的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据框，我有一列是像这样的数据框中的地图 -

I have a dataframe and I have a column which is a map in dataframe like this -

scala> df.printSchema

root
 |-- A1: map (nullable = true)
 |    |-- key: string
 |    |-- value: string (valueContainsNull = true)

我需要从数据框中选择所有键作为列名和值作为行.

I need to select all the keys from dataframe as column name and values as rows.

例如:假设我有 2 个这样的记录-

For eg: Let say I have 2 records like this-

1. key1 -> value1, key2 -> value2, key3 -> value3 ....
2. key1 -> value11, key3 -> value13, key4 -> value14 ...

我想要输出数据框

key1             key2                 key3             key4
value1           value2               value3            null
value11          null                 value13           value14

我该怎么做?

推荐答案

首先我们需要创建一个 id 列，我们可以通过它对您的数据进行分组，然后 explode映射列 A1，最后使用 pivot() 重塑你的 df:

First we need to create an id column by which we can group your data, then explode the map column A1, and finally reshape your df using pivot():

import org.apache.spark.sql.functions.{monotonically_increasing_id, explode, first}

df.withColumn("id", (monotonically_increasing_id()))
  .select($"id", explode($"A1"))
  .groupBy("id")
  .pivot("key")
  .agg(first("value")).show()
+---+-------+------+-------+-------+
| id|   key1|  key2|   key3|   key4|
+---+-------+------+-------+-------+
|  0| value1|value2| value3|   null|
|  1|value11|  null|value13|value14|
+---+-------+------+-------+-------+

这篇关于从哈希图中创建一个数据帧，键作为列名，值作为 Spark 中的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从哈希图中创建一个数据帧，键作为列名，值作为 Spark 中的行 [英] Create a dataframe from a hashmap with keys as column names and values as rows in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从哈希图中创建一个数据帧，键作为列名，值作为 Spark 中的行 [英] Create a dataframe from a hashmap with keys as column names and values as rows in Spark

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭