带有textFileStream的Python Spark Streaming示例不起作用.为什么? [英] Python Spark Streaming example with textFileStream does not work. Why?

查看:76
本文介绍了带有textFileStream的Python Spark Streaming示例不起作用.为什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用spark 1.3.1和Python 2.7

I use spark 1.3.1 and Python 2.7

这是我第一次体验Spark Streaming.

It is my first experience with Spark Streaming.

我尝试使用示例代码,该代码使用Spark Streaming从文件中读取数据.

I try example of code, which reads data from file using spark streaming.

这是示例的链接: https://github.com/apache/spark/blob/master/examples/src/main/python/streaming/hdfs_wordcount.py

我的代码如下:

My code is the following:

conf = (SparkConf()
     .setMaster("local")
     .setAppName("My app")
     .set("spark.executor.memory", "1g"))
sc = SparkContext(conf = conf)
ssc = StreamingContext(sc, 1)
lines = ssc.textFileStream('../inputs/2.txt')
counts = lines.flatMap(lambda line: line.split(" "))\
          .map(lambda x: (x, 1))\
          .reduceByKey(lambda a, b: a+b)
counts.pprint()
ssc.start()
ssc.awaitTermination()

2.txt文件的内容如下:

content of 2.txt file is following:


a1 b1 c1 d1 e1 f1 g1
a2 b2 c2 d2 e2 f2 g2
a3 b3 c3 d3 e3 f3 g3

我希望控制台中会包含与文件内容相关的内容,但没有任何内容.每秒只有这样的文字:

I expect that something related to file content will be in console, but there are nothing. Nothing except text like this each second:


-------------------------------------------
Time: 2015-09-03 15:08:18
-------------------------------------------

和Spark的日志.

我做错什么了吗?否则为什么它不起作用?

Do I do some thing wrong? Otherwise why it does not work?

推荐答案

我遇到了类似的问题,但是我意识到,一旦我设置了Streaming运行,streamingcontext就会从新文件中获取数据.流启动后,它只会提取新放置在源目录中的数据.

I faced similar issue but what I realized is that once I set the Streaming running, streamingcontext picks up the data from new files. It only ingests data newly placed in the source directory once the streaming is up.

实际上,pyspark文档使其非常明确:

Actually, pyspark document makes it very explicit:

textFileStream(目录)

textFileStream(directory)

Create an input stream that monitors a Hadoop-compatible file system for new files and reads them as text files. Files must be wrriten to the monitored directory by "moving" them from another location within the same file system. File names starting with . are ignored.

这篇关于带有textFileStream的Python Spark Streaming示例不起作用.为什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆