如何为 Kafka Producer 选择 Key 和 Offset [英] How to choose a Key and Offset for a Kafka Producer

查看:84
本文介绍了如何为 Kafka Producer 选择 Key 和 Offset的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在关注这里.同时遵循代码.我想出了两个问题

I'm following here.While following the code. I came up with two Questions

  1. Key 和 offset 是否相同?

据谷歌称,

偏移量: Kafka 主题通过分布式集合接收消息存储它们的分区.每个分区维护它按顺序接收到的消息由偏移量标识,也称为位置.

Offset: A Kafka topic receives messages across a distributed set of partitions where they are stored. Each partition maintains the messages it has received in a sequential order where they are identified by an offset, also known as a position.

对我来说两者似乎非常相似.由于 offset 在分区中维护了一条唯一消息:生产者根据记录的键将记录发送到分区

  1. 为制作人选择 Key/Offset 的最佳方法是什么?

以我上面提供的示例为例,他们选择了时间戳作为密钥和偏移量.这总是最好的推荐吗?

For an instance the example which I provided above, they have chosen the timestamp as the Key and offset. Is this the always the best recommendation?

 class IRCMessageListener extends IRCEventAdapter {
    @Override
    public void onPrivmsg(String channel, IRCUser u, String msg) {
        IRCMessage event = new IRCMessage(channel, u, msg);
        //FIXME kafka round robin default partitioner seems to always publish to partition 0 only (?)
        long ts = event.getInt64("timestamp");
        Map<String, ?> srcOffset = Collections.singletonMap(TIMESTAMP_FIELD, ts);
        Map<String, ?> srcPartition = Collections.singletonMap(CHANNEL_FIELD, channel);
        SourceRecord record = new SourceRecord(srcPartition, srcOffset, topic, KEY_SCHEMA, ts, IRCMessage.SCHEMA, event);
        queue.offer(record);
    }

因为我实际上是在尝试创建一个自定义的 Kafka 连接器来从 3rd Party WebSocket API 获取数据.API 为给定的 Key 值发送​​实时数据流消息.所以我想把那个 Key 用于我的 PartitionKey 和 Offset.但需要确保我的想法是正确的.

Because I'm actually trying to create a custom Kafka connector to get the data from 3rd Party WebSocket API. The API sends real-time data stream messages for a given Key value. So I thought of using that Key for my PartitionKey as well as Offset. But need to make sure I'm right about my thought.

推荐答案

Key 是一个可选的元数据,可以和 Kafka 消息一起发送,默认用于将消息路由到特定的分区.例如.如果您要向具有 p 个分区的主题 mytopic 发送密钥为 k 的消息 m,然后 m 转到 mytopic 中的分区 Hash(k) % p.它与分区的偏移没有任何关系.消费者使用偏移量来跟踪分区中最后读取消息的位置.在您的情况下,如果时间戳相当随机分布,那么没关系,否则您可能会在将其用作键时导致分区不平衡.

Key is an optional metadata, that can be sent with a Kafka message, and by default, it is used to route message to a specific partition. E.g. if you're sending a message m with key as k, to a topic mytopic that has p partitions, then m goes to the partition Hash(k) % p in mytopic. It has no connection to the offset of a partition whatsoever. Offsets are used by consumers to keep track of the position of last read message in a partition. In your case, if the timestamp is fairly randomly distributed, then it's fine, else you might be causing partition imbalance while using it as key.

这篇关于如何为 Kafka Producer 选择 Key 和 Offset的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆