在Windows系统中打印流的内容(火花流) [英] Print the content of streams (Spark streaming) in Windows system
问题描述
我只想将流的内容打印到控制台.我编写了以下代码,但未打印任何内容.任何人都可以帮助我在Spark中以流形式读取文本文件吗? Windows系统有问题吗?
public static void main(String[] args) throws Exception {
SparkConf sparkConf = new SparkConf().setAppName("My app")
.setMaster("local[2]")
.setSparkHome("C:\\Spark\\spark-1.5.1-bin-hadoop2.6")
.set("spark.executor.memory", "2g");
JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));
JavaDStream<String> dataStream = jssc.textFileStream("C://testStream//copy.csv");
dataStream.print();
jssc.start();
jssc.awaitTermination();
}
更新:copy.csv的内容为
0,0,12,5,0
0,0,12,5,0
0,1,2,0,42
0,0,0,0,264
0,0,12,5,0
textFileStream
用于监视hadoop兼容目录.该操作将监视提供的目录,并在提供的目录中添加新文件时,将从新添加的文件中读取/流式传输数据.
您无法使用textFileStream
读取文本/csv文件,或者我想说您不需要流式传输,以防您只是在读取文件.
我的建议是监视某个目录(可以是HDFS或本地文件系统),然后添加文件并使用textFileStream
捕获这些新文件的内容.
可能在您的代码中,可能是您可以用C://testStream"
替换"C://testStream//copy.csv"
,并且一旦Spark Streaming作业启动并运行,然后将文件copy.csv
添加到C://testStream
文件夹,然后在Spark Console上查看输出./p>
OR
也许您可以编写另一个命令行Scala/Java程序,该程序读取文件并将内容通过Socket(在某个PORT#上)抛出,然后您可以利用socketTextStream
捕获和读取数据.读取数据后,您可以进一步应用其他转换或输出操作.
您也可以考虑利用水槽 >
请参阅 UPDATE: The content of copy.csv is
You cannot read text/ csv files using My Suggestion would be to monitor some directory (may be HDFS or local file system) and then add files and capture the content of these new files using May be in your code may be you can replace OR may be you can write another command line Scala/ Java program which read the files and throw the content over the Socket (at a certain PORT#) and next you can leverage You can also think of leveraging Flume too Refer to API Documentation for more details 这篇关于在Windows系统中打印流的内容(火花流)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!0,0,12,5,0
0,0,12,5,0
0,1,2,0,42
0,0,0,0,264
0,0,12,5,0
textFileStream
is for Monitoring the hadoop Compatible Directories. This operation will watch the provided directory and as you add new files in the provided directory it will read/ stream the data from the newly added files.textFileStream
or rather I would say that you do not need streaming in case you are just reading the files.textFileStream
. "C://testStream//copy.csv"
with C://testStream"
and once your Spark Streaming job is up and running then add file copy.csv
to C://testStream
folder and see the output on Spark Console.socketTextStream
for capturing and reading the data. Once you have read the data, you further apply other transformation or output operations.