火花是工作流既" CP"和" MV" [英] Is spark streaming works with both "cp" and "mv"

查看:212
本文介绍了火花是工作流既" CP"和" MV"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用的火花流

我的程序不断地从一个文件夹的Hadoop读取流。问题是如果我复制到我的Hadoop文件夹(Hadoop的FS -copyFromLocal)的火花作业启动,但如果我这样做移动(Hadoop的FS -mv / hadoopsourcePath / * /的DestinationPath / ),这是行不通的。

My program continuously read streams from a hadoop folder .The problem is If I copy to my hadoop folder( hadoop fs -copyFromLocal) the spark job starts but if I do move (hadoop fs -mv /hadoopsourcePath/* /destinationPath/ ) it does not work .

它是火花流的限制?

我要激发流相关的另一个问题:
可以激发流挑特定的文件

I have another question related to spark streaming : Can spark streaming pick specific files

推荐答案

明白了火花1.5 ..它的工作原理
但它只能挑选那些时间戳等于当前时间戳的文件。

Got it ..It works in spark 1.5 But it picks only those files whose timestamp equal to current time stamp .

例如

临时文件夹:文件f.txt(时间戳T1:创建文件时)

Temp Folder : file f.txt (timestamp t1: when the file was created)

星火输入文件夹:/输入

Spark Input folder : /input

当你做一个MV(Hadoop的FS -mv /temp/f.txt /输入):星火不会挑

when you do a mv ( hadoop fs -mv /temp/f.txt /input) : Spark will not pick

但是,如果你改变了移动的文件的时间戳移动后,火花会挑。

But after moving if you change the timestamp of the moved file , spark will pick .

<一个href=\"https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala\" rel=\"nofollow\">https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala

不得不检查火花源$ C ​​$ C。

Had to check the source code of spark .

这篇关于火花是工作流既&QUOT; CP&QUOT;和&QUOT; MV&QUOT;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆