如何使用Spark-Scala解析JSON数据 [英] How to parse the JSON data using Spark-Scala

查看:123
本文介绍了如何使用Spark-Scala解析JSON数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要解析JSON数据,如下面的预期结果所示,目前,我无法在Signal列中获取如何包含信号名称(ABS,ADA,ADW).任何帮助将非常感激.

I've requirement to parse the JSON data as shown in the expected results below, currently i'm not getting how to include the signals name(ABS, ADA, ADW) in Signal column. Any help would be much appreciated.

我尝试了一些可以给出如下所示结果的方法,但是我还需要将所有信号都包括在SIGNAL列中,并在预期结果中显示出来.

I tried something which gives the results as shown below, but i will need to include all the signals in SIGNAL column as well which is shown in the expected results.

jsonDF.select(explode($"ABS") as "element").withColumn("stime", col("element.E")).withColumn("can_value", col("element.V")).drop(col("element")).show()

+-------------+--------- --+
|        stime|can_value   |
+-------------+---------   +
|value of E   |value of V  |
+-------------+----------- +

df.printSchema

 -- ABS: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- E: long (nullable = true)
 |    |    |-- V: long (nullable = true)
 |-- ADA: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- E: long (nullable = true)
 |    |    |-- V: long (nullable = true)
 |-- ADW: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- E: long (nullable = true)
 |    |    |-- V: long (nullable = true)
 |-- ALT: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- E: long (nullable = true)
 |    |    |-- V: double (nullable = true)
 |-- APP: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- E: long (nullable = true)
 |    |    |-- V: double (nullable = true)

I will need output like below:

-----------------+-------------+---------+
|SIGNAL        |stime            |can_value|
+-----------------+-------------+---------+
|ABS           |value of E   |value of V  |
|ADA           |value of E   |value of V  |
|ADW           |value of E   |value of V  |
+-----------------+-------------+---------+

推荐答案

要获得预期的输出,并在Signal列中插入值:

To get the expected output, and to insert values in Signal column:

jsonDF.select(explode($"ABS") as "element")
    .withColumn("stime", col("element.E"))
    .withColumn("can_value", col("element.V"))
    .drop(col("element"))
    .withColumn("SIGNAL",lit("ABS"))
    .show()

上述方法的通用版本:

(基于df.printSchema的结果,假设您具有信号值作为列名,并且这些列包含具有struct(E,V)形式的元素的数组)

(Based on the result of df.printSchema assuming that, you have signal values as column names, and those columns contain array having elements of the form struct(E,V))

val columns:Array[String] = df.columns

var arrayOfDFs:Array[DataFrame] = Array()

for(col_name <- columns){

  val temp = df.selectExpr("explode("+col_name+") as element")
    .select(
      lit(col_name).as("SIGNAL"),
      col("element.E").as("stime"),
      col("element.V").as("can_value"))

  arrayOfDFs = arrayOfDFs :+ temp
}

val jsonDF = arrayOfDFs.reduce(_ union _)
jsonDF.show(false)

这篇关于如何使用Spark-Scala解析JSON数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆