Spark 中的爆炸结构 [英] Exploded Struct in Spark

查看:28
本文介绍了Spark 中的爆炸结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下架构的 DataFrame:

I have DataFrame with following schema:

 |-- data: struct (nullable = true)
 |    |-- asin: string (nullable = true)
 |    |-- customerId: long (nullable = true)
 |    |-- eventTime: long (nullable = true)
 |    |-- marketplaceId: long (nullable = true)
 |    |-- rating: long (nullable = true)
 |    |-- region: string (nullable = true)
 |    |-- type: string (nullable = true)
 |-- uploadedDate: long (nullable = true)

我想分解结构,使 asin、customerId、eventTime 等所有元素成为 DataFrame 中的列.我尝试了爆炸函数,但它适用于 Array 而不是 struct 类型.是否可以将能力数据帧转换为以下数据帧:

I want to explode the struct such that all elements like asin, customerId, eventTime become the columns in DataFrame. I tried explode function but it works on Array not on struct type. Is it possible to convert the able data frame to below dataframe:

     |-- asin: string (nullable = true)
     |-- customerId: long (nullable = true)
     |-- eventTime: long (nullable = true)
     |-- marketplaceId: long (nullable = true)
     |-- rating: long (nullable = true)
     |-- region: string (nullable = true)
     |-- type: string (nullable = true)
     |-- uploadedDate: long (nullable = true)

推荐答案

很简单:

val newDF = df.select("uploadedDate", "data.*");

您告诉选择上传日期,然后选择字段数据的所有子元素

You tell to select uploadedDate and then all subelements of field data

示例:

scala> case class A(a: Int, b: Double)
scala> val df = Seq((A(1, 1.0), "1"), (A(2, 2.0), "2")).toDF("data", "uploadedDate")
scala> val newDF = df.select("uploadedDate", "data.*")
scala> newDF.show()
+------------+---+---+
|uploadedDate|  a|  b|
+------------+---+---+
|           1|  1|1.0|
|           2|  2|2.0|
+------------+---+---+

scala> newDF.printSchema()
root
 |-- uploadedDate: string (nullable = true)
 |-- a: integer (nullable = true)
 |-- b: double (nullable = true)

这篇关于Spark 中的爆炸结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆