在Kafka中如何根据生产时间获得确切的偏移量 [英] In Kafka how to get the exact offset according producing time

查看:22
本文介绍了在Kafka中如何根据生产时间获得确切的偏移量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一天一小时地获取 Kafka 中产生的消息.每隔一小时,我将启动一项工作来消费 1 小时前生成的消息.例如,如果当前时间是 20:12,我将在 19:00:00 和 19:59:59 之间消费消息.这意味着我需要在 19:00:00 时间开始偏移,在 19:59:59 时间结束偏移.我使用 SimpleConsumer.getOffsetsBefore 如「0.8.0 SimpleConsumer Example」.问题是返回的偏移量与作为参数给出的时间戳不匹配.例如当时间戳为 19:00:00 时,我会在 16:38:00 时间生成消息.

I need to get the message produced in Kafka hour by hour in a day. Every one hour I will launch a job to consume the message produced 1 hour ago. e.g., if current time is 20:12, I will consume the message between 19:00:00 and 19:59:59. That means I need to get start offset by time 19:00:00 and end offset by time 19:59:59. I used SimpleConsumer.getOffsetsBefore as shown in 「0.8.0 SimpleConsumer Example」. The problem is the returning offset does not match the timestamp given as a parameter. e.g. When make timestamp 19:00:00, I get the message produced at time 16:38:00.

推荐答案

在 Kafka 中,目前无法获得与特定时间戳对应的偏移量 - 这是设计使然.如 Jay Kreps 的日志文章,偏移量为日志提供了一种与挂钟时间分离的时间戳.将偏移量作为您的时间概念,那么您就可以知道任何两个系统是否处于一致状态,只需购买知道它们读取到的偏移量即可.对于不同服务器上的不同时钟时间、闰年、夏令时、时区等,从来没有任何混淆.这有点不错...

In Kafka there is currently no way to get an offset that corresponds to a particular timestamp - this is by design. As described near the top of Jay Kreps's Log Article, the offset number provides a sort of timestamp for the log that is decoupled from the wall clock time. With the offset as your notion of time then you can know if any two systems are in a consistent state just buy knowing what offset they have read until. There is never any confusion about different clock times on different servers, leap years, day light savings time, time zones, etc. It's kinda nice...

现在...说了这么多,如果您知道您的服务器在某个时间 X 宕机,那么实际上,您真的很想知道相应的偏移量.你可以靠近.kafka 机器上的日志文件根据它们开始写入的时间命名,并且存在一个 kafka 工具(我现在找不到),让您知道哪些偏移量与这​​些文件相关联.不过,如果您想知道确切的时间戳,那么您必须在发送给 Kafka 的消息中对时间戳进行编码.

NOW... all that said, if you know your server went down at some time X then practically speaking, you would really like to know the corresponding offset. You can get close. The log files on the kafka machines are named according to the time that they started writing, and there exists a kafka tool (that I can't find right now) that let's you know which offsets are associated with these files. If you want to know the exact timestamp though, then you must encode the timestamp in the messages that you're sending to Kafka.

这篇关于在Kafka中如何根据生产时间获得确切的偏移量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆