如何将 StructType 从 Spark 中的 json 数据帧分解为行而不是列 [英] How to explode StructType to rows from json dataframe in Spark rather than to columns

查看:38
本文介绍了如何将 StructType 从 Spark 中的 json 数据帧分解为行而不是列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用这个模式读取了一个嵌套的 json :

I read a nested json with this schema :

 root
 |-- company: struct (nullable = true)
 |    |-- 0: string (nullable = true)
 |    |-- 1: string (nullable = true)
 |    |-- 10: string (nullable = true)
 |    |-- 100: string (nullable = true)
 |    |-- 101: string (nullable = true)
 |    |-- 102: string (nullable = true)
 |    |-- 103: string (nullable = true)
 |    |-- 104: string (nullable = true)
 |    |-- 105: string (nullable = true)
 |    |-- 106: string (nullable = true)
 |    |-- 107: string (nullable = true)
 |    |-- 108: string (nullable = true)
 |    |-- 109: string (nullable = true)

当我尝试:

df.select(col("company.*"))

我将结构公司"的每个字段都作为列.但我希望它们作为行.我想在另一列中获得带有 id 和字符串的行:

I get every fields of the struct "company" as columns. But I want them as rows. I would like to get a row with the id and the string in another column :

  0        1         10       100      101        102 
"hey"   "yooyo"    "yuyu"    "hey"   "yooyo"    "yuyu"

而是得到类似的东西:

id    name
0     "hey"
1     "yoooyo"
10    "yuuy"
100   "hey"
101   "yooyo"
102    "yuyu"

预先感谢您的帮助,

棘手

推荐答案

尝试使用 union:

Try this using union:

val dfExpl = df.select("company.*")

dfExpl.columns
.map(name => dfExpl.select(lit(name),col(name)))
  .reduce(_ union _)
  .show

或者使用数组/爆炸:

val dfExpl = df.select("company.*")
val selectExpr = dfExpl
  .columns
  .map(name =>
    struct(
      lit(name).as("id"),
      col(name).as("value")
    ).as("col")
  )


dfExpl
  .select(
    explode(array(selectExpr: _*))
  )
  .select("col.*")
  .show()

这篇关于如何将 StructType 从 Spark 中的 json 数据帧分解为行而不是列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆