如何从 NiFi 中的 GetFilesProcessor 读取文件 [英] how to read files from GetFilesProcessor in NiFi

查看:31
本文介绍了如何从 NiFi 中的 GetFilesProcessor 读取文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下是我的流程:

GetFile > ExecuteSparkInteractive > PutFile

我想从 ExecuteSparkInteractive 处理器中的 GetFile 处理器读取文件,应用一些转换并将其放在某个位置.下面是我的流程

I want to read files from GetFile processor in ExecuteSparkInteractive processor, apply some transformations and put it in some location. Below is my flow

我在spark处理器的code部分下写了spark scala code:

I wrote spark scala code under code section of spark processor:

val sc1=sc.textFile("local_path")
sc1.foreach(println)

流程中没有任何事情发生.那么如何使用 GetFile 处理器读取 Spark 处理器中的文件.

There is nothing happening in the flow. So how can I read files in spark processor using GetFile processor.

第二部分:
我试过下面的流程只是为了练习:

2nd Part:
I tried below flow just for practice:

ExecuteScript > PutFile > LogMessage

并且我在executescript处理器中提到了以下代码:

and I have mentioned below code in executescript processor:

readFile = open("/home/cloudera/Desktop/sample/data","r")
for line in readFile:
    lines = line.strip()
    finalline = re.sub(pattern='((?<=[0-9])[0-9]|(?<=\.)[0-9])',repl='X',string=lines)
readFile = open("/home/cloudera/Desktop/sample/data","w")
readFile.write(finalline)  

代码工作正常,但它不会将格式化数据写入目标文件夹.那么我这里哪里出错了.另外,我在本地机器上安装了 pandas 并从 executescript 处理器运行了 pandas 代码,但 nifi 不读取 pandas 模块.为什么会这样?我已经尽力了.另外,我找不到任何相关链接,我可以在其中获得基本流程

Code works fine but it doesn't write the formatted data into the destination folder. So where am I going wrong over here. Also, I installed pandas in local machine and ran pandas code from the executescript processor but nifi doesn't read pandas module. Why is it so ? I tried my best. Also, I couldn't find any relevant links for this where I can get basic flow

推荐答案

这不是真正的工作方式... GetFile 正在拾取 NiFi 节点本地的文件并将它们带入 NiFi 流进行处理.ExecuteSparkInteractive 在远程 Spark 集群上启动 Spark 作业,它不会将数据传输到 Spark.因此,您可能希望将数据放在 Spark 可以访问的地方,可能是 GetFile -> PutHDFS -> ExecuteSparkInteractive.

This is not really how it works... GetFile is picking up files local to the NiFi node and bringing them into the NiFi flow for processing. ExecuteSparkInteractive kicks off a spark job on a remote Spark cluster, it does not transfer data to Spark. So you would likely want to put the data somewhere Spark can access it, maybe GetFile -> PutHDFS -> ExecuteSparkInteractive.

这篇关于如何从 NiFi 中的 GetFilesProcessor 读取文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆