我可以在与 Kafka Broker 相同的机器上运行 Kafka Streams 应用程序吗? [英] Can I run Kafka Streams Application on the same machine as of Kafka Broker?

查看:29
本文介绍了我可以在与 Kafka Broker 相同的机器上运行 Kafka Streams 应用程序吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个 Kafka Streams 应用程序,它从几个主题中获取数据并将数据连接起来并将其放入另一个主题中.

I have a Kafka Streams Application which takes data from few topics and joins the data and puts it in another topic.

Kafka 配置:

5 kafka brokers
Kafka Topics - 15 partitions and 3 replication factor. 

注意:我在运行 Kafka Broker 的同一台机器上运行 Kafka Streams 应用程序.

每小时消耗/产生数百万条记录.每当我关闭任何 kafka 经纪人时,它都会进入重新平衡状态,大约需要重新平衡需要 30 分钟,有时甚至更长,而且很多时候它会杀死许多 Kafka Streams 进程.

Few millions of records are consumed/produced every hour. Whenever I take any kafka broker down, it goes into rebalancing and it takes approx. 30 minutes or sometimes even more for rebalancing and many times it kills many of the Kafka Streams processes.

推荐答案

回答标题中的问题:

来自 Spark/HDFS 背景,我认为这是一种思维转变,因为您习惯于认为在数据所在的位置进行处理以利用数据局部性是件好事.在这里,代理将提供数据位置,但必须将数据发送到 Kafka Streams 集群进行处理(失去一些好处).但是,将它们分开可以让您分别管理两个集群.

Coming from a Spark/HDFS background, I think this is a change of thinking, since you are used to think that it is good to have your processing where your data is, to take advantage of data locality. Here, the broker will provide the data locality but will have to send the data to Kafka Streams cluster for processing (losing some of its benefits). However, keeping them separate allows you to manage both clusters separately.

如果您想到一个运行高延迟处理作业的集群,它共享数据 + 处理(例如 HDFS + YARN 集群),您可以获得数据所在的进程",而不是相反.您可以为您的数据处理分配资源 - 但我们的想法是您的处理不依赖于临时数据峰值(就像流媒体一样),而是依赖于总数据量.如果您的数据增长,您的计算将花费更多,您可以分配更多资源,但它们会同时增长.但是,在流应用程序上,必要的处理能力确实取决于数据峰值(和您的低延迟要求)而不是总数据量,因此存储和处理分开来衡量和管理是有意义的,因为它们的弹性需求不是基于相同的维度.

If you think of a cluster that runs high-latency processing jobs, that shares data + processing (e.g. an HDFS + YARN cluster), you can get "the process where data is" and not the opposite. You can allocate resources for your data processing - but the idea is that your processing does not depend on temporary data spikes (as it does with Streaming) but on the total data volumes. If your data grows, your calculations will take more, and you can allocate more resources, but they will grow at the same time. However, on a streaming application, necessary processing power does depend on data spikes (and your low-latency requirements) and not on total data volumes, so it makes sense that storage and processing are dimensioned and managed separately, since their elasticity demands are not based on the same dimension.

这与同时具有数据处理 - Kafka 代理 - 和数据处理 - 同一节点中的 Kafka Streams 会给节点带来更多负载的显而易见的事实不同,但我们在这里假设在确定尺寸时已考虑到这一点节点.

This comes apart from the obvious fact that having both data handling - Kafka broker - and data processing - Kafka Streams in the same node puts more load into a node, but we are assuming here this has been taken into account when dimensioning your nodes.

这篇关于我可以在与 Kafka Broker 相同的机器上运行 Kafka Streams 应用程序吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆