Spark Scala Dataframe将一列结构数组转换为一列映射 [英] Spark Scala Dataframe convert a column of Array of Struct to a column of Map

查看：64 发布时间：2021/11/14 21:57:55 scala apache-spark apache-spark-sql

本文介绍了Spark Scala Dataframe将一列结构数组转换为一列映射的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 Scala 的新手.我有一个带字段的数据框

I am new to Scala. I have a Dataframe with fields

ID:string, Time:timestamp, Items:array(struct(name:string,ranking:long))

我想将 Items 字段的每一行转换为一个哈希图，以 name 作为键.我不太确定该怎么做.

I want to convert each row of the Items field to a hashmap, with the name as the key. I am not very sure how to do this.

推荐答案

这可以使用 UDF 来完成:

This can be done using a UDF:

import spark.implicits._
import org.apache.spark.sql.functions._
import org.apache.spark.sql.Row

// Sample data:
val df = Seq(
  ("id1", "t1", Array(("n1", 4L), ("n2", 5L))),
  ("id2", "t2", Array(("n3", 6L), ("n4", 7L)))
).toDF("ID", "Time", "Items")

// Create UDF converting array of (String, Long) structs to Map[String, Long]
val arrayToMap = udf[Map[String, Long], Seq[Row]] {
  array => array.map { case Row(key: String, value: Long) => (key, value) }.toMap
}

// apply UDF
val result = df.withColumn("Items", arrayToMap($"Items"))

result.show(false)
// +---+----+---------------------+
// |ID |Time|Items                |
// +---+----+---------------------+
// |id1|t1  |Map(n1 -> 4, n2 -> 5)|
// |id2|t2  |Map(n3 -> 6, n4 -> 7)|
// +---+----+---------------------+

如果没有 UDF(仅使用 Spark 的内置函数)，我看不出有什么方法可以做到这一点.

I can't see a way to do this without a UDF (using Spark's built-in functions only).

这篇关于Spark Scala Dataframe将一列结构数组转换为一列映射的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark Scala Dataframe将一列结构数组转换为一列映射 [英] Spark Scala Dataframe convert a column of Array of Struct to a column of Map

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Scala Dataframe将一列结构数组转换为一列映射 [英] Spark Scala Dataframe convert a column of Array of Struct to a column of Map

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭