Spark 数据帧将嵌套的 JSON 转换为单独的列 [英] Spark dataframes convert nested JSON to seperate columns

查看：34 发布时间：2021/11/14 22:14:18 apache-spark apache-spark-sql spark-dataframe

本文介绍了Spark 数据帧将嵌套的 JSON 转换为单独的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个具有以下结构的 JSON 流，可以转换为数据帧

I've a stream of JSONs with following structure that gets converted to dataframe

{
  "a": 3936,
  "b": 123,
  "c": "34",
  "attributes": {
    "d": "146",
    "e": "12",
    "f": "23"
  }
}

数据帧显示函数导致以下输出

The dataframe show functions results in following output

sqlContext.read.json(jsonRDD).show

+----+-----------+---+---+
|   a| attributes|  b|  c|
+----+-----------+---+---+
|3936|[146,12,23]|123| 34|
+----+-----------+---+---+

如何将属性列(嵌套的 JSON 结构)拆分为 attributes.d、attributes.e 和 attributes.f 作为单独列到一个新的数据帧中，所以我可以在新数据框中将列作为 a、b、c、attributes.d、attributes.e 和 attributes.f 吗?

How can I split attributes column (nested JSON structure) into attributes.d, attributes.e and attributes.f as seperate columns into a new dataframe, so I can have columns as a, b, c, attributes.d, attributes.e and attributes.f in the new dataframe?

推荐答案

如果你想要从 a 到 f 命名的列:

df.select("a", "b", "c", "attributes.d", "attributes.e", "attributes.f")

如果您想要使用 attributes. 前缀命名的列:

df.select($"a", $"b", $"c", $"attributes.d" as "attributes.d", $"attributes.e" as "attributes.e", $"attributes.f" as "attributes.f")

如果您的列名称是从外部来源(例如配置)提供的:

If names of your columns are supplied from an external source (e.g. configuration):

val colNames: Seq("a", "b", "c", "attributes.d", "attributes.e", "attributes.f")

df.select(colNames.head, colNames.tail: _*).toDF(colNames:_*)

这篇关于Spark 数据帧将嵌套的 JSON 转换为单独的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Spark 数据帧将嵌套的 JSON 转换为单独的列 [英] Spark dataframes convert nested JSON to seperate columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 数据帧将嵌套的 JSON 转换为单独的列 [英] Spark dataframes convert nested JSON to seperate columns

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭