如何将RDD数据保存到json文件而不是文件夹中 [英] How to save RDD data into json files, not folders

查看：340 发布时间：2020/9/4 1:02:10 scala apache-spark spark-streaming

本文介绍了如何将RDD数据保存到json文件而不是文件夹中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在接收要保存在S3中的流数据myDStream(DStream[String])(基本上，对于此问题，我想将输出确切地保存在什么位置都没有关系，但是我提到的是以防万一).

I am receiving the streaming data myDStream (DStream[String]) that I want to save in S3 (basically, for this question, it doesn't matter where exactly do I want to save the outputs, but I am mentioning it just in case).

以下代码效果很好，但是它保存名称为jsonFile-19-45-46.json的文件夹，然后在文件夹内保存文件_SUCCESS和part-00000.

The following code works well, but it saves folders with the names like jsonFile-19-45-46.json, and then inside the folders it saves files _SUCCESS and part-00000.

是否可以将每个RDD[String](这些是JSON字符串)数据保存到JSON 文件中，而不是文件夹中?我以为repartition(1)必须做出这个trick俩，但事实并非如此.

Is it possible to save each RDD[String] (these are JSON strings) data into the JSON files, not the folders? I thought that repartition(1) had to make this trick, but it didn't.

    myDStream.foreachRDD { rdd => 
       // datetimeString = ....
       rdd.repartition(1).saveAsTextFile("s3n://mybucket/keys/jsonFile-"+datetimeString+".json")
    }

如何将RDD数据保存到json文件而不是文件夹中 [英] How to save RDD data into json files, not folders

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将RDD数据保存到json文件而不是文件夹中 [英] How to save RDD data into json files, not folders

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭