Apache NiFi 是否支持批处理? [英] Does Apache NiFi support batch processing?

查看:30
本文介绍了Apache NiFi 是否支持批处理?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要知道 Apache NiFi 是否支持运行处理器直到完成.

I need to know if Apache NiFi supports running processors until completion.

进程组中一系列处理器的执行等待另一个进程组结果执行完成".

例如:

假设 NiFi UI 中有三个处理器.

Suppose there are three processors in NiFi UI.

    P1-->P2-->P3
    P-->Processor

现在我需要运行 P1 如果它完全运行然后运行 ​​P2 最后它会像序列一样运行但等待另一个完成.

Now I need to run P1 if it run completely then run P2 And finally it will run like sequence but one wait for another to be complete.

EDIT-1:

举个例子,我在网址中有数据.我可以使用 GetHTTP 处理器下载该数据.现在我将其存储在 putFile 内容中.如果文件保存在 putFile 目录中,则运行 FetchFile 将该文件处理到我的数据库中,如下面的工作流程.

Just for example I have data in web URL. I can download that data using GetHTTP Processor. Now I stored that in putFile content. If file saved in putFile directory then run FetchFile to process that file into my database like below workflow.

GetHTTP-->PutFile-->FetchFile-->DB

这可能吗?

推荐答案

NiFi 本身并不是真正的批处理系统,它是一个更面向连续处理的数据流系统.话虽如此,您可以使用一些技术来执行类似批处理的操作,具体取决于您使用的处理器.

NiFi itself is not really a batch processing system, it is a data flow system more geared towards continuous processing. Having said that, there are some techniques you can use to do batch-like operations, depending on which processors you're using.

拆分处理器(SplitText、SplitJSON 等)将属性写入流文件,其中包括fragment.identifier"(对于从传入流文件创建的所有拆分而言是唯一的)和fragment.count",即这些拆分的总数.像 MergeContent 这样的处理器使用这些属性来处理整个批次(也就是片段),因此这些类型的处理器的输出会在整个批次/片段处理完毕后发生.

The Split processors (SplitText, SplitJSON, etc.) write attributes to the flow files that include a "fragment.identifier" which is unique for all splits created from an incoming flow file, and "fragment.count" which is the total number of those splits. Processors like MergeContent use those attributes to process a whole batch (aka fragment), so the output from those kinds of processors would occur after an entire batch/fragment has been processed.

另一种技术是在作业完成时在临时目录中写入一个空文件,然后 ListFile 处理器(指向该临时目录)会在检测到该文件时发出一个流文件.

Another technique is to write an empty file in a temp directory when the job is complete, then a ListFile processor (pointing at that temp directory) would issue a flow file when the file is detected.

您能否详细描述一下流程中的处理器,以及您如何知道批次何时完成?

Can you describe more about the processors in your flow, and how you would know when a batch was complete?

这篇关于Apache NiFi 是否支持批处理?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆