Spark 数据框到嵌套地图 [英] Spark dataframe to nested map

查看：28 发布时间：2021/11/14 22:46:58 scala apache-spark dataframe hashmap apache-spark-sql

本文介绍了Spark 数据框到嵌套地图的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

如何将 spark 中相当小的数据框(最大 300 MB)转换为嵌套映射以改进 spark 的 DAG.我相信此操作将比稍后加入更快(Spark 动态 DAG 慢得多并且与硬编码 DAG 不同)，因为转换后的值是在自定义估计器的训练步骤中创建的.现在我只想在管道的预测步骤中快速应用它们.

How can I convert a rather small data frame in spark (max 300 MB) to a nested map in order to improve spark's DAG. I believe this operation will be quicker than a join later on (Spark dynamic DAG is a lot slower and different from hard coded DAG) as the transformed values were created during the train step of a custom estimator. Now I just want to apply them really quick during predict step of the pipeline.

val inputSmall = Seq(
    ("A", 0.3, "B", 0.25),
    ("A", 0.3, "g", 0.4),
    ("d", 0.0, "f", 0.1),
    ("d", 0.0, "d", 0.7),
    ("A", 0.3, "d", 0.7),
    ("d", 0.0, "g", 0.4),
    ("c", 0.2, "B", 0.25)).toDF("column1", "transformedCol1", "column2", "transformedCol2")

这给出了错误的地图类型

This gives the wrong type of map

val inputToMap = inputSmall.collect.map(r => Map(inputSmall.columns.zip(r.toSeq):_*))

我宁愿想要这样的东西:

I would rather want something like:

Map[String, Map[String, Double]]("column1" -> Map("A" -> 0.3, "d" -> 0.0, ...), "column2" -> Map("B" -> 0.25), "g" -> 0.4, ...)

推荐答案

从最终地图中移除了收集操作

removed collect operation from final map

如果您使用的是 Spark 2+，这里有一个建议:

If you are using Spark 2+, here's a suggestion:

val inputToMap = inputSmall.select(
  map($"column1", $"transformedCol1").as("column1"),
  map($"column2", $"transformedCol2").as("column2")
)

val cols = inputToMap.columns
val localData = inputToMap.collect

cols.map { colName => 
  colName -> localData.flatMap(_.getAs[Map[String, Double]](colName)).toMap
}.toMap

这篇关于Spark 数据框到嵌套地图的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark 数据框到嵌套地图 [英] Spark dataframe to nested map

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 数据框到嵌套地图 [英] Spark dataframe to nested map

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭