Spark Streaming:StreamingContext 不读取数据文件 [英] Spark Streaming: StreamingContext doesn't read data files

查看:98
本文介绍了Spark Streaming:StreamingContext 不读取数据文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 Spark Streaming 的新手,我正在尝试使用 Spark-shell 开始使用它.假设我在spark-1.2.0-bin-hadoop2.4的根目录下有一个名为dataTest"的目录.

I'm new in Spark Streaming and I'm trying to getting started with it using Spark-shell. Assuming I have a directory called "dataTest" placed in the root directory of spark-1.2.0-bin-hadoop2.4.

我想在 shell 中测试的简单代码是(在输入 $.\bin\spark-shell 之后):

The simple code that I want to test in the shell is (after typing $.\bin\spark-shell):

import org.apache.spark.streaming._
val ssc = new StreamingContext(sc, Seconds(2))
val data = ssc.textFileStream("dataTest")
println("Nb lines is equal to= "+data.count())
data.foreachRDD { (rdd, time) => println(rdd.count()) }
ssc.start()
ssc.awaitTermination()

然后,我复制了目录dataTest"中的一些文件(我还尝试重命名该目录中的一些现有文件).

And then, I copy some files in the directory "dataTest" (and also I tried to rename some existing files in this directory).

但不幸的是我没有得到我想要的(即我没有得到任何输出,所以看起来 ssc.textFileStream 不能正常工作),只是一些事情:

But unfortunately I did not get what I want (i.e, I didn't get any outpout, so it seems like ssc.textFileStream doesn't work correctly), just some things like:

15/01/15 19:32:46 INFO JobScheduler: Added jobs for time 1421346766000 ms
15/01/15 19:32:46 INFO JobScheduler: Starting job streaming job 1421346766000 ms
.0 from job set of time 1421346766000 ms
15/01/15 19:32:46 INFO SparkContext: Starting job: foreachRDD at <console>:20
15/01/15 19:32:46 INFO DAGScheduler: Job 69 finished: foreachRDD at <console>:20
, took 0,000021 s
0
15/01/15 19:32:46 INFO JobScheduler: Finished job streaming job 1421346766000 ms
.0 from job set of time 1421346766000 ms
15/01/15 19:32:46 INFO MappedRDD: Removing RDD 137 from persistence list
15/01/15 19:32:46 INFO JobScheduler: Total delay: 0,005 s for time 1421346766000
ms (execution: 0,002 s)
15/01/15 19:32:46 INFO BlockManager: Removing RDD 137
15/01/15 19:32:46 INFO UnionRDD: Removing RDD 78 from persistence list
15/01/15 19:32:46 INFO BlockManager: Removing RDD 78
15/01/15 19:32:46 INFO FileInputDStream: Cleared 1 old files that were older tha
n 1421346706000 ms: 1421346704000 ms
15/01/15 19:32:46 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()

推荐答案

使用命令行复制文件/文档或将文件/文档另存为对我有用的目录.当您通常(通过 IDE)复制时,这不会影响修改日期作为流上下文监视器修改日期.

Copy file/document Using command line or save as the file/document to the directory work for me. When you normally copy(by IDE) this can't effect the modified date as streaming context monitor modified date.

这篇关于Spark Streaming:StreamingContext 不读取数据文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆