在边缘 NiFi 处理器内的集群节点之间分发内容 [英] Distribution of content among cluster nodes within edge NiFi processors

查看:25
本文介绍了在边缘 NiFi 处理器内的集群节点之间分发内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在研究 NiFi 文档.我必须同意,这是有据可查的开源项目之一.

I was exploring NiFi documentation. I must agree that it is one of the well documented open-source projects out there.

我的理解是处理器运行在集群的所有节点上.但是,我想知道当我们使用 FetchS3Object、FetchHDFS 等内容拉取处理器时,内容如何在集群节点之间分配.在 FetchHDFS 或 FetchSFTP 等处理器中,所有节点都会连接到源吗?是拆分内容并从多个节点获取还是一个节点获取内容并在下游队列中对其进行负载均衡?

My understanding is that the processor runs on all nodes of the cluster. However, I was wondering about how the content is distributed among cluster nodes when we use content pulling processors like FetchS3Object, FetchHDFS etc. In processor like FetchHDFS or FetchSFTP, will all nodes make connection to the source? Does it split the content and fetch from multiple nodes or One node fetched the content and load balance it in the downstream queues?

推荐答案

@dagget 的答案传统上是处理这种情况的方法,通常称为列表 + 获取"模式.列表处理器仅在主节点上运行,列表发送到 RPG 以在集群中重新分布,输入端口接收列表并连接到在所有节点上并行读取的读取处理器.

The answer by @dagget has traditionally been the approach to handle this situation, often referred to as the "list + fetch" pattern. List processor runs on Primary Node only, listings sent to RPG to re-distribute across the cluster, input port receives listings and connect to a fetch processor running on all nodes fetching in parallel.

在 1.8.0 中,现在有负载平衡连接,不再需要 RPG.您仍将仅在主节点上运行 List 处理器,然后将其直接连接到 Fetch 处理器,并配置其间的队列以实现负载平衡.

In 1.8.0 there are now load balanced connections which remove the need for the RPG. You would still run the List processor on Primary Node only, but then connect it directly to the Fetch processors, and configure the queue in between to load balance.

这篇关于在边缘 NiFi 处理器内的集群节点之间分发内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆