在Kafka中设计生产者和消费者的组件 [英] Designing a component both producer and consumer in Kafka

查看:23
本文介绍了在Kafka中设计生产者和消费者的组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 KafkaZookeeper 作为数据管道的主要组件,每秒处理数千个请求.我使用 Samza 作为实时数据处理工具,用于我需要对数据进行的小型转换.

I am using Kafka and Zookeeper as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza as the real time data processing tool for small transformations that I need to make on the data.

我的问题是我的一个消费者(比如 ConsumerA)消费了来自 Kafka 的几个主题并处理它们.基本上创建一个被消化的主题的摘要.我还想将此数据作为单独的主题推送到 Kafka,但这会在 Kafka 和我的组件上形成一个循环.

My problem is that one of my consumers (lets say ConsumerA) consumes several topics from Kafka and processes them. Basically creating a summary of the topics that are digested. I further want to push this data to Kafka as a separate topic but that forms a loop on Kafka and my component.

这就是困扰我的问题,这是 Kafka 中想要的架构吗?

我是否应该在 Samza 中完成所有处理,并仅将摘要(摘要)信息从 Samza 存储到 Kafka.但是我要做的处理量非常大,这就是为什么我想为它使用一个单独的组件(ComponentA).我想我的问题可以推广到所有类型的数据管道.

Should I rather do all the processing in Samza and store only the digested (summary) information to the Kafka from Samza. But the amount of processing I am going to do is quite heavy, that is why I want to use a separate component for it (ComponentA). I guess my question can be generalized to all kind of data pipelines.

那么组件在数据管道中既是消费者又是生产者是一种好的做法吗?

推荐答案

只要 Samza 写入的主题与它消耗的主题不同,就没有问题.读取和写入 Kafka 的 Samza 作业是架构的规范和意图.还可以使用 Samza 作业从另一个系统引入一些数据,或者将一些数据从 Kafka 写入不同系统的作业(甚至根本不使用 Kafka 的作业).

As long as Samza is writing to different topics than it is consuming from, no, there will be no problem. Samza jobs that read from and write to Kafka are the norm and intended by the architecture. One can also have Samza jobs that bring some data in from another system, or jobs that write some data from Kafka out to a different system (or even jobs that don't use Kafka at all).

然而,在同一主题中读取和写入作业是会出现循环并应避免的地方.这有可能非常快地填满您的 Kafka 代理的磁盘.

Having a job read from and write to the same topic, is, however, where you'd get a loop and to be avoided. This has the potential to fill up your Kafka brokers' disks really fast.

这篇关于在Kafka中设计生产者和消费者的组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆