Spark 中的爆炸结构 [英] Exploded Struct in Spark
本文介绍了Spark 中的爆炸结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有以下架构的 DataFrame:
I have DataFrame with following schema:
|-- data: struct (nullable = true)
| |-- asin: string (nullable = true)
| |-- customerId: long (nullable = true)
| |-- eventTime: long (nullable = true)
| |-- marketplaceId: long (nullable = true)
| |-- rating: long (nullable = true)
| |-- region: string (nullable = true)
| |-- type: string (nullable = true)
|-- uploadedDate: long (nullable = true)
我想分解结构,使 asin、customerId、eventTime 等所有元素成为 DataFrame 中的列.我尝试了爆炸函数,但它适用于 Array 而不是 struct 类型.是否可以将能力数据帧转换为以下数据帧:
I want to explode the struct such that all elements like asin, customerId, eventTime become the columns in DataFrame. I tried explode function but it works on Array not on struct type. Is it possible to convert the able data frame to below dataframe:
|-- asin: string (nullable = true)
|-- customerId: long (nullable = true)
|-- eventTime: long (nullable = true)
|-- marketplaceId: long (nullable = true)
|-- rating: long (nullable = true)
|-- region: string (nullable = true)
|-- type: string (nullable = true)
|-- uploadedDate: long (nullable = true)
推荐答案
很简单:
val newDF = df.select("uploadedDate", "data.*");
您告诉选择上传日期,然后选择字段数据的所有子元素
You tell to select uploadedDate and then all subelements of field data
示例:
scala> case class A(a: Int, b: Double)
scala> val df = Seq((A(1, 1.0), "1"), (A(2, 2.0), "2")).toDF("data", "uploadedDate")
scala> val newDF = df.select("uploadedDate", "data.*")
scala> newDF.show()
+------------+---+---+
|uploadedDate| a| b|
+------------+---+---+
| 1| 1|1.0|
| 2| 2|2.0|
+------------+---+---+
scala> newDF.printSchema()
root
|-- uploadedDate: string (nullable = true)
|-- a: integer (nullable = true)
|-- b: double (nullable = true)
这篇关于Spark 中的爆炸结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文