Spark Dataframe:从 Map 类型生成元组数组 [英] Spark Dataframe: Generate an Array of Tuple from a Map type
问题描述
我的下游源不支持 Map 类型,而我的源支持并因此发送此类型.我需要将此映射转换为结构数组(元组).
My downstream source does not support a Map type and my source does and as such sends this. I need to convert this map into an array of struct (tuple).
Scala 支持 Map.toArray,它为您创建一个元组数组,这似乎是我需要在 Map 上进行转换的函数:
Scala support Map.toArray which creates an array of tuple for you which seems like the function I need on the Map to transform:
{
"a" : {
"b": {
"key1" : "value1",
"key2" : "value2"
},
"b_" : {
"array": [
{
"key": "key1",
"value" : "value1"
},
{
"key": "key2",
"value" : "value2"
}
]
}
}
}
假设要更改的字段也是嵌套字段,Spark 中执行此操作的最有效方法是什么.例如
What is the most efficient way in Spark to do this assuming that also the field to change is a nested one. e.g
a 是根级数据框列
a.b 是第 1 层的地图(来自源)
a.b is the map at level 1 (comes from the source)
a.b_ 是 struct 的数组类型(这是我在将 a.b 转换为数组时想要生成的)
a.b_ is the array type of struct (this is what I want to generate in converting a.b to the array)
到目前为止的答案是我认为的一些方式,只是可以得到建议生成的 withColumn 和 UDF,如下所示.
The answer so far goes some of the way I think, just can get the withColumn and UDF suggested to generate as below.
谢谢!
推荐答案
只需使用udf
:
val toArray = udf((vs: Map[String, String]) => vs.toArray)
并根据您的需要调整输入类型.
and adjust input type according to your needs.
这篇关于Spark Dataframe:从 Map 类型生成元组数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!