Spark Dataframe:从 Map 类型生成元组数组 [英] Spark Dataframe: Generate an Array of Tuple from a Map type

查看:160
本文介绍了Spark Dataframe:从 Map 类型生成元组数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的下游源不支持 Map 类型,而我的源支持并因此发送此类型.我需要将此映射转换为结构数组(元组).

My downstream source does not support a Map type and my source does and as such sends this. I need to convert this map into an array of struct (tuple).

Scala 支持 Map.toArray,它为您创建一个元组数组,这似乎是我需要在 Map 上进行转换的函数:

Scala support Map.toArray which creates an array of tuple for you which seems like the function I need on the Map to transform:

{
  "a" : {
    "b": {
      "key1" : "value1",
      "key2" : "value2"
    },
    "b_" : {
      "array": [
        {
          "key": "key1",
          "value" : "value1"
        },
        {
          "key": "key2",
          "value" : "value2"
        }
      ]
    }
  }
}

假设要更改的字段也是嵌套字段,Spark 中执行此操作的最有效方法是什么.例如

What is the most efficient way in Spark to do this assuming that also the field to change is a nested one. e.g

a 是根级数据框列

a.b 是第 1 层的地图(来自源)

a.b is the map at level 1 (comes from the source)

a.b_ 是 struct 的数组类型(这是我在将 a.b 转换为数组时想要生成的)

a.b_ is the array type of struct (this is what I want to generate in converting a.b to the array)

到目前为止的答案是我认为的一些方式,只是可以得到建议生成的 withColumn 和 UDF,如下所示.

The answer so far goes some of the way I think, just can get the withColumn and UDF suggested to generate as below.

谢谢!

推荐答案

只需使用udf:

val toArray = udf((vs: Map[String, String]) => vs.toArray)

并根据您的需要调整输入类型.

and adjust input type according to your needs.

这篇关于Spark Dataframe:从 Map 类型生成元组数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆