将 StructType 分解为 MapType Spark [英] Exploding StructType as MapType Spark

查看:55
本文介绍了将 StructType 分解为 MapType Spark的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在 Spark 中将 structType 转换为 MapType.

Converting structType to MapType in Spark.

架构:

event: struct (nullable = true)
|    | event_category: string (nullable = true)
|    | event_name: string (nullable = true)
|    | properties: struct (nullable = true)
|    |    | prop1: string (nullable = true)
|    |    | prop2: string (nullable = true)

示例数据:

{ "event": {
     "event_category: "abc",
      "event_name": "click",
      "properties" : {
          "prop1": "prop1Value",
          "prop2": "prop2Value",
          ....
      }
   } 
}

需要如下值:

event_category | event_name | properties_key | properties_value | 
abc            | click      | prop1          | prop1Value
abc            | click      | prop2          | prop2Value

推荐答案

你必须找到一些机制来创建propertiesmap struct.我使用了 udf 函数来 zip keyvalues 并返回 arrays.

You will have to find some mechanism to create map of properties struct. I have used udf function to zip the key and values and return arrays of key and value.

import org.apache.spark.sql.functions._
def collectUdf = udf((cols: collection.mutable.WrappedArray[String], values: collection.mutable.WrappedArray[String]) => cols.zip(values))

spark 不支持多个生成器,因此您必须将 dataframe 保存到临时 dataframe.

val columnsMap = df_json.select($"event.properties.*").columns
val temp = df_json.withColumn("event_properties", explode(collectUdf(lit(columnsMap), array($"event.properties.*"))))

最后一步是将 event_properties 列分开

The last step would be to just separate the event_properties column

temp.select($"event.event_category", $"event.event_name", $"event_properties._1".as("properties_key"), $"event_properties._2".as("properties_value")).show(false)

你应该拥有你想要的

+--------------+----------+--------------+----------------+
|event_category|event_name|properties_key|properties_value|
+--------------+----------+--------------+----------------+
|abc           |click     |prop1         |prop1Value      |
|abc           |click     |prop2         |prop2Value      |
+--------------+----------+--------------+----------------+

这篇关于将 StructType 分解为 MapType Spark的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆