如何从NiFi中的GetFilesProcessor中读取文件 [英] how to read files from GetFilesProcessor in NiFi
问题描述
以下是我的流程:
GetFile > ExecuteSparkInteractive > PutFile
我想从ExecuteSparkInteractive
处理器中的GetFile
处理器读取文件,应用一些转换并将其放在某个位置.下面是我的流程
I want to read files from GetFile
processor in ExecuteSparkInteractive
processor, apply some transformations and put it in some location. Below is my flow
我在火花处理器的code
部分下写了spark scala code
:
I wrote spark scala code
under code
section of spark processor:
val sc1=sc.textFile("local_path")
sc1.foreach(println)
流程中没有任何反应.因此,如何使用GetFile处理器在Spark处理器中读取文件.
There is nothing happening in the flow. So how can I read files in spark processor using GetFile processor.
第二部分:
我尝试下面的流程只是为了练习:
2nd Part:
I tried below flow just for practice:
ExecuteScript > PutFile > LogMessage
并且我在executescript处理器中提到了以下代码:
and I have mentioned below code in executescript processor:
readFile = open("/home/cloudera/Desktop/sample/data","r")
for line in readFile:
lines = line.strip()
finalline = re.sub(pattern='((?<=[0-9])[0-9]|(?<=\.)[0-9])',repl='X',string=lines)
readFile = open("/home/cloudera/Desktop/sample/data","w")
readFile.write(finalline)
代码工作正常,但不会将格式化的数据写入目标文件夹.所以我在哪里哪里错了. 另外,我在本地计算机上安装了pandas并从executescript处理器运行了pandas代码,但nifi无法读取pandas模块.为什么会这样呢? 我已经尽力了.另外,我找不到与此相关的任何链接,可以从中获得基本流程
Code works fine but it doesn't write the formatted data into the destination folder. So where am I going wrong over here. Also, I installed pandas in local machine and ran pandas code from the executescript processor but nifi doesn't read pandas module. Why is it so ? I tried my best. Also, I couldn't find any relevant links for this where I can get basic flow
推荐答案
这实际上不是它的工作原理... GetFile正在拾取NiFi节点本地的文件,并将它们带入NiFi流中进行处理. ExecuteSparkInteractive在远程Spark集群上启动Spark作业,它不会将数据传输到Spark.因此,您可能希望将数据放在Spark可以访问的位置,例如GetFile-> PutHDFS-> ExecuteSparkInteractive.
This is not really how it works... GetFile is picking up files local to the NiFi node and bringing them into the NiFi flow for processing. ExecuteSparkInteractive kicks off a spark job on a remote Spark cluster, it does not transfer data to Spark. So you would likely want to put the data somewhere Spark can access it, maybe GetFile -> PutHDFS -> ExecuteSparkInteractive.
这篇关于如何从NiFi中的GetFilesProcessor中读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!