Spark数据框:根据地图类型生成元组数组 [英] Spark Dataframe: Generate an Array of Tuple from a Map type

查看:99
本文介绍了Spark数据框:根据地图类型生成元组数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的下游源不支持Map类型,而我的源却支持,因此将其发送出去.我需要将此映射转换为struct(元组)数组.

My downstream source does not support a Map type and my source does and as such sends this. I need to convert this map into an array of struct (tuple).

Scala支持Map.toArray,它为您创建一个元组数组,这似乎是我需要在Map上进行转换的函数:

Scala support Map.toArray which creates an array of tuple for you which seems like the function I need on the Map to transform:

{
  "a" : {
    "b": {
      "key1" : "value1",
      "key2" : "value2"
    },
    "b_" : {
      "array": [
        {
          "key": "key1",
          "value" : "value1"
        },
        {
          "key": "key2",
          "value" : "value2"
        }
      ]
    }
  }
}

在假设要更改的字段也是嵌套字段的情况下,Spark最有效的方法是什么.例如

What is the most efficient way in Spark to do this assuming that also the field to change is a nested one. e.g

a是根级别数据框列

a.b是第1级的地图(来自来源)

a.b is the map at level 1 (comes from the source)

a.b_是struct的数组类型(这是我要在将a.b转换为数组时生成的内容)

a.b_ is the array type of struct (this is what I want to generate in converting a.b to the array)

到目前为止,答案是我认为的某种方式,只是可以按照以下建议生成withColumn和UDF.

The answer so far goes some of the way I think, just can get the withColumn and UDF suggested to generate as below.

谢谢!

推荐答案

只需使用udf:

val toArray = udf((vs: Map[String, String]) => vs.toArray)

并根据需要调整输入类型.

and adjust input type according to your needs.

这篇关于Spark数据框:根据地图类型生成元组数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆