保存火花RDD到本地文件系统中使用Java [英] Save a spark RDD to the local file system using Java
问题描述
我有一个使用星火产生RDD。现在,如果我写这篇文章RDD到CSV文件,我提供了类似saveAsTextFile(),它输出的CSV文件到HDFS的一些方法。
I have a RDD that is generated using Spark. Now if I write this RDD to a csv file, I am provided with some methods like "saveAsTextFile()" which outputs a csv file to the HDFS.
我想将文件写入我的本地文件系统,以便我的SSIS进程可以从系统中挑选文件并将它们加载到数据库中。
I want to write the file to my local file system so that my SSIS process can pick the files from the system and load them into the DB.
我目前无法使用sqoop。
I am currently unable to use sqoop.
时的某处有可能在比Java编写shell脚本做其他的。
Is it somewhere possible in Java other than writing shell scripts to do that.
任何需要的清晰度,请让知道。
Any clarity needed, please let know.
推荐答案
saveAsTextFile
是能够利用本地文件系统路径(如文件: /// tmp目录/魔法/...
)。但是,如果你的分布式集群上运行,您很可能希望收集()
将数据回群集,然后用标准的文件操作进行保存。
saveAsTextFile
is able to take in local file system paths (e.g. file:///tmp/magic/...
). However, if your running on a distributed cluster, you most likely want to collect()
the data back to the cluster and then save it with standard file operations.
这篇关于保存火花RDD到本地文件系统中使用Java的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!