使用 Kafka 进行数据建模?主题和分区 [英] Data Modeling with Kafka? Topics and Partitions

查看：26 发布时间：2021/11/12 1:51:25 apache-kafka apache-zookeeper data-modeling

本文介绍了使用 Kafka 进行数据建模?主题和分区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在使用新服务(例如非 RDBMS 数据存储或消息队列)时，我首先考虑的事情之一是:我应该如何构建数据?".

One of the first things I think about when using a new service (such as a non-RDBMS data store or a message queue) is: "How should I structure my data?".

我阅读并观看了一些介绍性材料.特别是，例如，Kafka:用于日志处理的分布式消息传递系统，它写道:

I've read and watched some introductory materials. In particular, take, for example, Kafka: a Distributed Messaging System for Log Processing, which writes:

主题是与消息关联的容器"
并行的最小单位是主题的分区.这意味着......属于主题特定分区的所有消息都将被消费者组中的消费者消费."

知道了这一点，什么是说明如何使用主题和分区的好例子?什么时候应该成为话题?什么时候应该分区?

Knowing this, what would be a good example that illustrates how to use topics and partitions? When should something be a topic? When should something be a partition?

举个例子，假设我的(Clojure)数据如下所示:

As an example, let's say my (Clojure) data looks like:

{:user-id 101 :viewed "/page1.html" :at #inst "2013-04-12T23:20:50.22Z"}
{:user-id 102 :viewed "/page2.html" :at #inst "2013-04-12T23:20:55.50Z"}

主题是否应该基于user-id?查看?在?分区呢?

Should the topic be based on user-id? viewed? at? What about the partition?

我如何决定?

推荐答案

在为 Kafka 构建数据时，它实际上取决于它的使用方式.

When structuring your data for Kafka it really depends on how it´s meant to be consumed.

在我看来，主题是一组相似类型的消息，将由相同类型的消费者消费，因此在上面的示例中，我只有一个主题，如果您决定推送一些其他类型的数据通过Kafka，您可以稍后为此添加一个新主题.

In my mind, a topic is a grouping of messages of a similar type that will be consumed by the same type of consumer so in the example above, I would just have a single topic and if you´ll decide to push some other kind of data through Kafka, you can add a new topic for that later.

主题已在 ZooKeeper 中注册，这意味着如果尝试添加过多主题，您可能会遇到问题，例如如果您有 100 万用户并决定为每个用户创建一个主题.

Topics are registered in ZooKeeper which means that you might run into issues if trying to add too many of them, e.g. the case where you have a million users and have decided to create a topic per user.

另一方面，分区是一种并行化消息消费的方法.broker 集群中的分区总数至少需要与消费者组中的消费者数量相同才能理解分区功能. 消费者组中的消费者将分担根据分区处理他们之间的主题，以便一个消费者只关心分区本身被分配给"的消息.

Partitions on the other hand is a way to parallelize the consumption of the messages. The total number of partitions in a broker cluster need to be at least the same as the number of consumers in a consumer group to make sense of the partitioning feature. Consumers in a consumer group will split the burden of processing the topic between themselves according to the partitioning so that one consumer will only be concerned with messages in the partition itself is "assigned to".

可以使用生产者端的分区键显式设置分区，或者如果未提供，将为每条消息选择一个随机分区.

Partitioning can either be explicitly set using a partition key on the producer side or if not provided, a random partition will be selected for every message.

这篇关于使用 Kafka 进行数据建模?主题和分区的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用 Kafka 进行数据建模?主题和分区 [英] Data Modeling with Kafka? Topics and Partitions

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用 Kafka 进行数据建模?主题和分区 [英] Data Modeling with Kafka? Topics and Partitions

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭