Spark数据框:根据地图类型生成元组数组 [英] Spark Dataframe: Generate an Array of Tuple from a Map type
问题描述
我的下游源不支持Map类型,而我的源却支持,因此将其发送出去.我需要将此映射转换为struct(元组)数组.
My downstream source does not support a Map type and my source does and as such sends this. I need to convert this map into an array of struct (tuple).
Scala支持Map.toArray,它为您创建一个元组数组,这似乎是我需要在Map上进行转换的函数:
Scala support Map.toArray which creates an array of tuple for you which seems like the function I need on the Map to transform:
{
"a" : {
"b": {
"key1" : "value1",
"key2" : "value2"
},
"b_" : {
"array": [
{
"key": "key1",
"value" : "value1"
},
{
"key": "key2",
"value" : "value2"
}
]
}
}
}
在假设要更改的字段也是嵌套字段的情况下,Spark最有效的方法是什么.例如
What is the most efficient way in Spark to do this assuming that also the field to change is a nested one. e.g
a是根级别数据框列
a.b是第1级的地图(来自来源)
a.b is the map at level 1 (comes from the source)
a.b_是struct的数组类型(这是我要在将a.b转换为数组时生成的内容)
a.b_ is the array type of struct (this is what I want to generate in converting a.b to the array)
到目前为止,答案是我认为的某种方式,只是可以按照以下建议生成withColumn和UDF.
The answer so far goes some of the way I think, just can get the withColumn and UDF suggested to generate as below.
谢谢!
推荐答案
只需使用udf
:
val toArray = udf((vs: Map[String, String]) => vs.toArray)
并根据需要调整输入类型.
and adjust input type according to your needs.
这篇关于Spark数据框:根据地图类型生成元组数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!