在Apache Spark中写入文件 [英] Writing to a file in Apache Spark

查看:129
本文介绍了在Apache Spark中写入文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个Scala代码,该代码要求我写入HDFS中的文件. 当我在本地上使用Filewriter.write时,它可以工作.同一件事在HDFS上不起作用. 经过检查,我发现可以在Apache Spark-中编写以下选项 RDD.saveAsTextFileDataFrame.write.format.

I am writing a Scala code that requires me to write to a file in HDFS. When I use Filewriter.write on local, it works. The same thing does not work on HDFS. Upon checking, I found that there are the following options to write in Apache Spark- RDD.saveAsTextFile and DataFrame.write.format.

我的问题是:如果我只想向Apache Spark中的文件写入一个整数或字符串怎么办?

My question is: what if I just want to write an int or string to a file in Apache Spark?

跟进: 我需要将一个标题,DataFrame内容写入输出文件,然后附加一些字符串. sc.parallelize(Seq(<String>))有帮助吗?

Follow up: I need to write to an output file a header, DataFrame contents and then append some string. Does sc.parallelize(Seq(<String>)) help?

推荐答案

使用Seq使用数据(整数/字符串)创建RDD:请参见

create RDD with your data (int/string) using Seq: see parallelized-collections for details:

sc.parallelize(Seq(5))  //for writing int (5)
sc.parallelize(Seq("Test String")) // for writing string


val conf = new SparkConf().setAppName("Writing Int to File").setMaster("local")
val sc = new SparkContext(conf) 
val intRdd= sc.parallelize(Seq(5))   
intRdd.saveAsTextFile("out\\int\\test")


val conf = new SparkConf().setAppName("Writing string to File").setMaster("local")
val sc = new SparkContext(conf)   
val stringRdd = sc.parallelize(Seq("Test String"))
stringRdd.saveAsTextFile("out\\string\\test")

这篇关于在Apache Spark中写入文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆