在Scala中使用from_json（）解析一个DataFrame中的多个行 [英] Use from_json() in Scala to parse multiple Rows in a DataFrame

查看：565 发布时间：2020/10/17 0:29:34 json scala apache-spark dataframe

本文介绍了在Scala中使用from_json（）解析一个DataFrame中的多个行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在Spark DataFrame的列中有一个JSON，如下所示：

I've a JSON within a Column of a Spark DataFrame as follows:

ID|           Text|           JSON
------------------------------------------------------------------------------
1|             xyz|          [{"Hour": 1, "Total": 10, "Fail": 1}, {"Hour": 2, "Total": 40, "Fail": 4}, {"Hour": 3, "Total": 20, "Fail": 2}]

我正在使用以下架构

val schema = StructType(Array(StructField("Hour", IntegerType),
   StructField("Total", IntegerType), StructField("Fail", IntegerType))

我正在使用以下代码来解析DataFrame并输出JSON作为多列

I'm using following code to parse the DataFrame and output the JSON as multiple columns

val newDF = DF.withColumn("JSON", from_json(col("JSON"), schema)).select(col("JSON.*"))
newDF.show()

上面的代码只是解析JSON中的一条记录，但是，我希望它解析JSON中的所有记录。

The above code just parses the one single record from the JSON. But, I want it to parse all the records in the JSON.

输出如下：

Hour|       Total|       Fail|
-------------------------------
   1|          10|          1|
-------------------------------

但是，我希望输出如下：

But, I want the output to be as follows:

Hour|       Total|       Fail|
-------------------------------
   1|          10|          1|
   2|          40|          4|
   3|          20|          2|
-------------------------------

可以有人，请告诉我。

Can Someone, please let me know. What is it that I'm missing !!

预先感谢。

推荐答案

如果我正确解释了示例数据，则您的 JSON 列是带有已发布架构的JSON元素序列。在应用 from_json 之前，您需要爆炸该列，如下所示：

If I interpret your sample data correctly, your JSON column is a sequence of JSON elements with your posted schema. You'll need to explode the column before applying from_json as follows:

val df = Seq(
  (1, "xyz", Seq("""{"Hour": 1, "Total": 10, "Fail": 1}""",
                 """{"Hour": 2, "Total": 40, "Fail": 4}""",
                 """{"Hour": 3, "Total": 20, "Fail": 2}""")
  )).toDF("ID", "Text", "JSON")

import org.apache.spark.sql.types._

val jsonSchema = StructType(Array(
  StructField("Hour", IntegerType),
  StructField("Total", IntegerType),
  StructField("Fail", IntegerType)
))

df.
  withColumn("JSON", explode(col("JSON"))).
  withColumn("JSON", from_json(col("JSON"), jsonSchema)).
  select("JSON.*").
  show
// +----+-----+----+
// |Hour|Total|Fail|
// +----+-----+----+
// |   1|   10|   1|
// |   2|   40|   4|
// |   3|   20|   2|
// +----+-----+----+

这篇关于在Scala中使用from_json（）解析一个DataFrame中的多个行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Scala中使用from_json（）解析一个DataFrame中的多个行 [英] Use from_json() in Scala to parse multiple Rows in a DataFrame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

在Scala中使用from_json（）解析一个DataFrame中的多个行 [英] Use from_json() in Scala to parse multiple Rows in a DataFrame

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭