在Windows系统中打印流的内容(火花流) [英] Print the content of streams (Spark streaming) in Windows system

查看:207
本文介绍了在Windows系统中打印流的内容(火花流)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我只想将流的内容打印到控制台.我编写了以下代码,但未打印任何内容.任何人都可以帮助我在Spark中以流形式读取文本文件吗? Windows系统有问题吗?

public static void main(String[] args) throws Exception {

     SparkConf sparkConf = new SparkConf().setAppName("My app")
        .setMaster("local[2]")
        .setSparkHome("C:\\Spark\\spark-1.5.1-bin-hadoop2.6")
        .set("spark.executor.memory", "2g");

    JavaStreamingContext jssc = new JavaStreamingContext(sparkConf, Durations.seconds(2));

    JavaDStream<String> dataStream = jssc.textFileStream("C://testStream//copy.csv");
    dataStream.print();

    jssc.start();
    jssc.awaitTermination();
}

更新:copy.csv的内容为

0,0,12,5,0
0,0,12,5,0
0,1,2,0,42
0,0,0,0,264
0,0,12,5,0

解决方案

textFileStream用于监视hadoop兼容目录.该操作将监视提供的目录,并在提供的目录中添加新文件时,将从新添加的文件中读取/流式传输数据.

您无法使用textFileStream读取文本/csv文件,或者我想说您不需要流式传输,以防您只是在读取文件.

我的建议是监视某个目录(可以是HDFS或本地文件系统),然后添加文件并使用textFileStream捕获这些新文件的内容.

可能在您的代码中,可能是您可以用C://testStream"替换"C://testStream//copy.csv",并且一旦Spark Streaming作业启动并运行,然后将文件copy.csv添加到C://testStream文件夹,然后在Spark Console上查看输出./p>

OR

也许您可以编写另一个命令行Scala/Java程序,该程序读取文件并将内容通过Socket(在某个PORT#上)抛出,然后您可以利用socketTextStream捕获和读取数据.读取数据后,您可以进一步应用其他转换或输出操作.

您也可以考虑利用水槽

请参阅

UPDATE: The content of copy.csv is

0,0,12,5,0
0,0,12,5,0
0,1,2,0,42
0,0,0,0,264
0,0,12,5,0

textFileStream is for Monitoring the hadoop Compatible Directories. This operation will watch the provided directory and as you add new files in the provided directory it will read/ stream the data from the newly added files.

You cannot read text/ csv files using textFileStream or rather I would say that you do not need streaming in case you are just reading the files.

My Suggestion would be to monitor some directory (may be HDFS or local file system) and then add files and capture the content of these new files using textFileStream.

May be in your code may be you can replace "C://testStream//copy.csv" with C://testStream" and once your Spark Streaming job is up and running then add file copy.csv to C://testStream folder and see the output on Spark Console.

OR

may be you can write another command line Scala/ Java program which read the files and throw the content over the Socket (at a certain PORT#) and next you can leverage socketTextStream for capturing and reading the data. Once you have read the data, you further apply other transformation or output operations.

You can also think of leveraging Flume too

Refer to API Documentation for more details

这篇关于在Windows系统中打印流的内容(火花流)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆