Spark Streaming:StreamingContext 不读取数据文件 [英] Spark Streaming: StreamingContext doesn't read data files

查看：98 发布时间：2021/7/15 20:01:10 scala spark-streaming

本文介绍了Spark Streaming:StreamingContext 不读取数据文件的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是 Spark Streaming 的新手，我正在尝试使用 Spark-shell 开始使用它.假设我在spark-1.2.0-bin-hadoop2.4的根目录下有一个名为dataTest"的目录.

I'm new in Spark Streaming and I'm trying to getting started with it using Spark-shell. Assuming I have a directory called "dataTest" placed in the root directory of spark-1.2.0-bin-hadoop2.4.

我想在 shell 中测试的简单代码是(在输入 $.\bin\spark-shell 之后):

The simple code that I want to test in the shell is (after typing $.\bin\spark-shell):

import org.apache.spark.streaming._
val ssc = new StreamingContext(sc, Seconds(2))
val data = ssc.textFileStream("dataTest")
println("Nb lines is equal to= "+data.count())
data.foreachRDD { (rdd, time) => println(rdd.count()) }
ssc.start()
ssc.awaitTermination()

然后，我复制了目录dataTest"中的一些文件(我还尝试重命名该目录中的一些现有文件).

And then, I copy some files in the directory "dataTest" (and also I tried to rename some existing files in this directory).

但不幸的是我没有得到我想要的(即我没有得到任何输出，所以看起来 ssc.textFileStream 不能正常工作)，只是一些事情:

But unfortunately I did not get what I want (i.e, I didn't get any outpout, so it seems like ssc.textFileStream doesn't work correctly), just some things like:

15/01/15 19:32:46 INFO JobScheduler: Added jobs for time 1421346766000 ms
15/01/15 19:32:46 INFO JobScheduler: Starting job streaming job 1421346766000 ms
.0 from job set of time 1421346766000 ms
15/01/15 19:32:46 INFO SparkContext: Starting job: foreachRDD at <console>:20
15/01/15 19:32:46 INFO DAGScheduler: Job 69 finished: foreachRDD at <console>:20
, took 0,000021 s
0
15/01/15 19:32:46 INFO JobScheduler: Finished job streaming job 1421346766000 ms
.0 from job set of time 1421346766000 ms
15/01/15 19:32:46 INFO MappedRDD: Removing RDD 137 from persistence list
15/01/15 19:32:46 INFO JobScheduler: Total delay: 0,005 s for time 1421346766000
ms (execution: 0,002 s)
15/01/15 19:32:46 INFO BlockManager: Removing RDD 137
15/01/15 19:32:46 INFO UnionRDD: Removing RDD 78 from persistence list
15/01/15 19:32:46 INFO BlockManager: Removing RDD 78
15/01/15 19:32:46 INFO FileInputDStream: Cleared 1 old files that were older tha
n 1421346706000 ms: 1421346704000 ms
15/01/15 19:32:46 INFO ReceivedBlockTracker: Deleting batches ArrayBuffer()

Spark Streaming:StreamingContext 不读取数据文件 [英] Spark Streaming: StreamingContext doesn't read data files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark Streaming:StreamingContext 不读取数据文件 [英] Spark Streaming: StreamingContext doesn&#39;t read data files

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

Spark Streaming:StreamingContext 不读取数据文件 [英] Spark Streaming: StreamingContext doesn't read data files

登录关闭