Apache NiFi是否支持批处理? [英] Does Apache NiFi support batch processing?
问题描述
我需要知道Apache NiFi是否支持正在运行的处理器,直到完成.
I need to know if Apache NiFi supports running processors until completion.
进程组中一系列处理器的执行等待进程组结果执行完成".
例如:
假设NiFi UI中有三个处理器.
Suppose there are three processors in NiFi UI.
P1-->P2-->P3
P-->Processor
现在我需要先运行P1,然后再运行P2,最后它会像序列一样运行,但是要等另一个完成.
Now I need to run P1 if it run completely then run P2 And finally it will run like sequence but one wait for another to be complete.
EDIT-1:
例如,我在Web URL中有数据.我可以使用 GetHTTP
处理器下载该数据.现在,我将其存储在 putFile
内容中.如果文件保存在 putFile
目录中,请运行 FetchFile
将该文件处理到我的数据库中,如下所示.
Just for example I have data in web URL. I can download that data using GetHTTP
Processor. Now I stored that in putFile
content. If file saved in putFile
directory then run FetchFile
to process that file into my database like below workflow.
GetHTTP-->PutFile-->FetchFile-->DB
这可能吗?
推荐答案
NiFi本身并不是真正的批处理系统,它是一个更适合连续处理的数据流系统.话虽如此,根据您使用的处理器,可以使用一些技术来执行类似批处理的操作.
NiFi itself is not really a batch processing system, it is a data flow system more geared towards continuous processing. Having said that, there are some techniques you can use to do batch-like operations, depending on which processors you're using.
拆分处理器(SplitText,SplitJSON等)将属性写入流文件,这些属性包括"fragment.identifier"(对于从传入流文件创建的所有拆分都是唯一的)和"fragment.count"(即这些拆分的总数.诸如MergeContent之类的处理器使用这些属性来处理整个批处理(也称为片段),因此,在处理完整个批处理/碎片之后,这些类型的处理器的输出就会发生.
The Split processors (SplitText, SplitJSON, etc.) write attributes to the flow files that include a "fragment.identifier" which is unique for all splits created from an incoming flow file, and "fragment.count" which is the total number of those splits. Processors like MergeContent use those attributes to process a whole batch (aka fragment), so the output from those kinds of processors would occur after an entire batch/fragment has been processed.
另一种技术是在作业完成时在临时目录中写入一个空文件,然后在检测到文件时,ListFile处理器(指向该临时目录)将发布流文件.
Another technique is to write an empty file in a temp directory when the job is complete, then a ListFile processor (pointing at that temp directory) would issue a flow file when the file is detected.
您能否描述更多有关流程中的处理器的信息,以及如何知道批处理何时完成?
Can you describe more about the processors in your flow, and how you would know when a batch was complete?
这篇关于Apache NiFi是否支持批处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!