在Kafka中设计生产者和消费者的组件 [英] Designing a component both producer and consumer in Kafka

查看:122
本文介绍了在Kafka中设计生产者和消费者的组件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用KafkaZookeeper作为我的数据管道的主要组件,该管道每秒处理数千个请求.我正在使用Samza作为实时数据处理工具来进行我需要对数据进行的小转换.

I am using Kafka and Zookeeper as the main components of my data pipeline, which is processing thousands of requests each second. I am using Samza as the real time data processing tool for small transformations that I need to make on the data.

我的问题是我的一个使用者(让我们说ConsumerA)消耗了Kafka中的多个主题并对其进行处理.基本上创建摘要的摘要.我还想将这些数据作为一个单独的主题推送到Kafka,但这在Kafka和我的组件上形成了一个循环.

My problem is that one of my consumers (lets say ConsumerA) consumes several topics from Kafka and processes them. Basically creating a summary of the topics that are digested. I further want to push this data to Kafka as a separate topic but that forms a loop on Kafka and my component.

这让我感到困扰,这是卡夫卡想要的架构吗?

我宁愿在Samza中进行所有处理,并且仅将摘要(摘要)信息从Samza存储到Kafka.但是我要做的处理量非常大,这就是为什么我要为其使用单独的组件(ComponentA)的原因.我想我的问题可以推广到所有类型的数据管道.

Should I rather do all the processing in Samza and store only the digested (summary) information to the Kafka from Samza. But the amount of processing I am going to do is quite heavy, that is why I want to use a separate component for it (ComponentA). I guess my question can be generalized to all kind of data pipelines.

那么,让组件在数据管道中成为使用者和生产者是一种好习惯吗?

推荐答案

只要Samza写的主题与消费的主题不同,就不会有问题.读取和写入Kafka的Samza作业是该体系结构的规范和目标.也可以有Samza作业将一些数据从另一个系统引入,或者将某些数据从Kafka写入另一个系统(甚至是根本不使用Kafka的作业).

As long as Samza is writing to different topics than it is consuming from, no, there will be no problem. Samza jobs that read from and write to Kafka are the norm and intended by the architecture. One can also have Samza jobs that bring some data in from another system, or jobs that write some data from Kafka out to a different system (or even jobs that don't use Kafka at all).

但是,有一个作业可以读取和写入同一主题,这将导致循环并应避免.这有可能真的会很快填满您的Kafka经纪人的磁盘.

Having a job read from and write to the same topic, is, however, where you'd get a loop and to be avoided. This has the potential to fill up your Kafka brokers' disks really fast.

这篇关于在Kafka中设计生产者和消费者的组件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆