火花是工作流既" CP"和" MV" [英] Is spark streaming works with both "cp" and "mv"
问题描述
我使用的火花流
我的程序不断地从一个文件夹的Hadoop读取流。问题是如果我复制到我的Hadoop文件夹(Hadoop的FS -copyFromLocal)的火花作业启动,但如果我这样做移动(Hadoop的FS -mv / hadoopsourcePath / * /的DestinationPath / ),这是行不通的。
My program continuously read streams from a hadoop folder .The problem is If I copy to my hadoop folder( hadoop fs -copyFromLocal) the spark job starts but if I do move (hadoop fs -mv /hadoopsourcePath/* /destinationPath/ ) it does not work .
它是火花流的限制?
我要激发流相关的另一个问题:
可以激发流挑特定的文件
I have another question related to spark streaming : Can spark streaming pick specific files
推荐答案
明白了火花1.5 ..它的工作原理
但它只能挑选那些时间戳等于当前时间戳的文件。
Got it ..It works in spark 1.5 But it picks only those files whose timestamp equal to current time stamp .
例如
临时文件夹:文件f.txt(时间戳T1:创建文件时)
Temp Folder : file f.txt (timestamp t1: when the file was created)
星火输入文件夹:/输入
Spark Input folder : /input
当你做一个MV(Hadoop的FS -mv /temp/f.txt /输入):星火不会挑
when you do a mv ( hadoop fs -mv /temp/f.txt /input) : Spark will not pick
但是,如果你改变了移动的文件的时间戳移动后,火花会挑。
But after moving if you change the timestamp of the moved file , spark will pick .
<一个href=\"https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala\" rel=\"nofollow\">https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala
不得不检查火花源$ C $ C。
Had to check the source code of spark .
这篇关于火花是工作流既&QUOT; CP&QUOT;和&QUOT; MV&QUOT;的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!