Apache NiFi是否支持批处理? [英] Does Apache NiFi support batch processing?

查看:135
本文介绍了Apache NiFi是否支持批处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要知道Apache NiFi是否支持正在运行的处理器,直到完成.

I need to know if Apache NiFi supports running processors until completion.

进程组中一系列处理器的执行等待进程组结果执行完成".

例如:

假设NiFi UI中有三个处理器.

Suppose there are three processors in NiFi UI.

    P1-->P2-->P3
    P-->Processor

现在我需要先运行P1,然后再运行P2,最后它会像序列一样运行,但是要等另一个完成.

Now I need to run P1 if it run completely then run P2 And finally it will run like sequence but one wait for another to be complete.

EDIT-1:

例如,我在Web URL中有数据.我可以使用 GetHTTP 处理器下载该数据.现在,我将其存储在 putFile 内容中.如果文件保存在 putFile 目录中,请运行 FetchFile 将该文件处理到我的数据库中,如下所示.

Just for example I have data in web URL. I can download that data using GetHTTP Processor. Now I stored that in putFile content. If file saved in putFile directory then run FetchFile to process that file into my database like below workflow.

GetHTTP-->PutFile-->FetchFile-->DB

这可能吗?

推荐答案

NiFi本身并不是真正的批处理系统,它是一个更适合连续处理的数据流系统.话虽如此,根据您使用的处理器,可以使用一些技术来执行类似批处理的操作.

NiFi itself is not really a batch processing system, it is a data flow system more geared towards continuous processing. Having said that, there are some techniques you can use to do batch-like operations, depending on which processors you're using.

拆分处理器(SplitText,SplitJSON等)将属性写入流文件,这些属性包括"fragment.identifier"(对于从传入流文件创建的所有拆分都是唯一的)和"fragment.count"(即这些拆分的总数.诸如MergeContent之类的处理器使用这些属性来处理整个批处理(也称为片段),因此,在处理完整个批处理/碎片之后,这些类型的处理器的输出就会发生.

The Split processors (SplitText, SplitJSON, etc.) write attributes to the flow files that include a "fragment.identifier" which is unique for all splits created from an incoming flow file, and "fragment.count" which is the total number of those splits. Processors like MergeContent use those attributes to process a whole batch (aka fragment), so the output from those kinds of processors would occur after an entire batch/fragment has been processed.

另一种技术是在作业完成时在临时目录中写入一个空文件,然后在检测到文件时,ListFile处理器(指向该临时目录)将发布流文件.

Another technique is to write an empty file in a temp directory when the job is complete, then a ListFile processor (pointing at that temp directory) would issue a flow file when the file is detected.

您能否描述更多有关流程中的处理器的信息,以及如何知道批处理何时完成?

Can you describe more about the processors in your flow, and how you would know when a batch was complete?

这篇关于Apache NiFi是否支持批处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆