使用 Scala 将 DataSet 转换为 Json Array Spark [英] Converting DataSet to Json Array Spark using Scala

查看：55 发布时间：2021/11/14 22:13:15 json scala apache-spark apache-spark-sql apache-spark-dataset

本文介绍了使用 Scala 将 DataSet 转换为 Json Array Spark的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 Spark 新手，无法找出以下问题的解决方案.

I am new to the spark and unable to figure out the solution for the following problem.

我有一个 JSON 文件要解析，然后创建几个指标并将数据写回 JSON 格式.

I have a JSON file to parse and then create a couple of metrics and write the data back into the JSON format.

现在下面是我正在使用的代码

now following is my code I am using

import org.apache.spark.sql._
import org.apache.log4j.{Level, Logger}
import org.apache.spark.sql.functions._

object quick2 {

  def main(args: Array[String]): Unit = {
    Logger.getLogger("org").setLevel(Level.ERROR)
    val spark = SparkSession
      .builder
      .appName("quick1")
      .master("local[*]")
      .getOrCreate()

    val rawData = spark.read.json("/home/umesh/Documents/Demo2/src/main/resources/sampleQuick.json")

    val mat1 = rawData.select(rawData("mal_name"),rawData("cust_id")).distinct().orderBy("cust_id").toJSON.cache()
    val mat2 = rawData.select(rawData("file_md5"),rawData("mal_name")).distinct().orderBy(asc("file_md5")).toJSON.cache()

val write1 = mat1.coalesce(1).toJavaRDD.saveAsTextFile("/home/umesh/Documents/Demo2/src/test/mat1/")

val write = mat2.coalesce(1).toJavaRDD.saveAsTextFile("/home/umesh/Documents/Demo2/src/test/mat2/")
}
}

现在上面的代码正在编写正确的 json 格式.但是，矩阵也可以包含重复的结果例子:

Now above code is writing the proper json format. However, matrices can contain duplicate result as well example:

md5   mal_name
1       a
1       b
2       c
3       d
3       e

所以使用上面的代码，每个对象都被写成一行

so with above code every object is getting written in single line

喜欢这个

{"file_md5":"1","mal_name":"a"}
{"file_md5":"1","mal_name":"b"}
{"file_md5":"2","mal_name":"c"}
{"file_md5":"3","mal_name":"d"}

等等.

但我想合并常用键的数据:

but I want to combine the data of common keys:

所以输出应该是

{"file_md5":"1","mal_name":["a","b"]}

有人可以建议我在这里做什么.或者有没有其他更好的方法来解决这个问题.

can somebody please suggest me what shall I do here. Or if there is any other better way to approach this problem.

谢谢！

使用 Scala 将 DataSet 转换为 Json Array Spark [英] Converting DataSet to Json Array Spark using Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Scala 将 DataSet 转换为 Json Array Spark [英] Converting DataSet to Json Array Spark using Scala

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭