如何为Kafka生产者选择键和偏移 [英] How to choose a Key and Offset for a Kafka Producer

查看:111
本文介绍了如何为Kafka生产者选择键和偏移的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在此处.在遵循代码的同时.我想出了两个问题

I'm following here.While following the code. I came up with two Questions

  1. 键和偏移量是否相同?

根据Google,

偏移量:一个Kafka主题会在一组分散的 存储它们的分区.每个分区都维护 它已按顺序接收到的消息,它们是 由偏移量(也称为位置)标识.

Offset: A Kafka topic receives messages across a distributed set of partitions where they are stored. Each partition maintains the messages it has received in a sequential order where they are identified by an offset, also known as a position.

对于我来说,似乎两者都很相似.由于偏移量在分区中保持唯一的消息:生产者根据记录的键将记录发送到分区

  1. 为生产者选择键/偏移的最佳方法是什么?

对于上面我提供的示例,他们选择了时间戳作为Key和offset. 这始终是最好的建议吗?

For an instance the example which I provided above, they have chosen the timestamp as the Key and offset. Is this the always the best recommendation?

 class IRCMessageListener extends IRCEventAdapter {
    @Override
    public void onPrivmsg(String channel, IRCUser u, String msg) {
        IRCMessage event = new IRCMessage(channel, u, msg);
        //FIXME kafka round robin default partitioner seems to always publish to partition 0 only (?)
        long ts = event.getInt64("timestamp");
        Map<String, ?> srcOffset = Collections.singletonMap(TIMESTAMP_FIELD, ts);
        Map<String, ?> srcPartition = Collections.singletonMap(CHANNEL_FIELD, channel);
        SourceRecord record = new SourceRecord(srcPartition, srcOffset, topic, KEY_SCHEMA, ts, IRCMessage.SCHEMA, event);
        queue.offer(record);
    }

因为我实际上是在尝试创建自定义的Kafka连接器,以从3rd Party WebSocket API获取数据. API发送给定键值的实时数据流消息.所以我想到了将那个Key用作我的PartitionKey以及Offset.但是需要确保我对自己的想法是正确的.

Because I'm actually trying to create a custom Kafka connector to get the data from 3rd Party WebSocket API. The API sends real-time data stream messages for a given Key value. So I thought of using that Key for my PartitionKey as well as Offset. But need to make sure I'm right about my thought.

推荐答案

密钥是可选的元数据,可以与Kafka消息一起发送,默认情况下,它用于将消息路由到特定分区.例如.如果您要将密钥为 k 的消息 m 发送到具有 p 分区的主题 mytopic ,然后 m 进入 mytopic 中的分区Hash( k )% p .它与分区的偏移量没有任何关系.消费者使用偏移量来跟踪分区中最后读取的消息的位置.就您而言,如果时间戳是相当随机分布的,那很好,否则在将其用作键时可能会导致分区不平衡.

Key is an optional metadata, that can be sent with a Kafka message, and by default, it is used to route message to a specific partition. E.g. if you're sending a message m with key as k, to a topic mytopic that has p partitions, then m goes to the partition Hash(k) % p in mytopic. It has no connection to the offset of a partition whatsoever. Offsets are used by consumers to keep track of the position of last read message in a partition. In your case, if the timestamp is fairly randomly distributed, then it's fine, else you might be causing partition imbalance while using it as key.

这篇关于如何为Kafka生产者选择键和偏移的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆