在Kafka中如何根据生产时间获取准确的偏移量 [英] In Kafka how to get the exact offset according producing time

查看:285
本文介绍了在Kafka中如何根据生产时间获取准确的偏移量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要一天一天中每小时获取一次在卡夫卡产生的消息.每隔1个小时,我将启动一项工作来消耗1个小时前产生的消息.例如,如果当前时间是20:12,我将在19:00:00和19:59:59之间使用该消息.这意味着我需要在时间19:00:00之前开始偏移,在时间19:59:59之前结束偏移.我使用了SimpleConsumer.getOffsetsBefore,如「 0.8.0 SimpleConsumer示例」.问题是返回的偏移量与作为参数给出的时间戳不匹配.例如在制作时间戳记19:00:00时,我得到的消息是在时间16:38:00产生的.

I need to get the message produced in Kafka hour by hour in a day. Every one hour I will launch a job to consume the message produced 1 hour ago. e.g., if current time is 20:12, I will consume the message between 19:00:00 and 19:59:59. That means I need to get start offset by time 19:00:00 and end offset by time 19:59:59. I used SimpleConsumer.getOffsetsBefore as shown in 「0.8.0 SimpleConsumer Example」. The problem is the returning offset does not match the timestamp given as a parameter. e.g. When make timestamp 19:00:00, I get the message produced at time 16:38:00.

推荐答案

在Kafka中,目前无法获得与特定时间戳相对应的偏移量-这是设计使然.如杰伊·克雷普斯(Jay Kreps)的日志文章,偏移号为日志提供了一种与挂钟时间解耦的时间戳.用偏移量作为您的时间概念,那么您可以知道是否有两个系统处于一致状态,只需知道它们直到读取了什么偏移量即可.永远不会对不同服务器上的不同时钟时间,leap年,夏令时,时区等感到困惑.这很好...

In Kafka there is currently no way to get an offset that corresponds to a particular timestamp - this is by design. As described near the top of Jay Kreps's Log Article, the offset number provides a sort of timestamp for the log that is decoupled from the wall clock time. With the offset as your notion of time then you can know if any two systems are in a consistent state just buy knowing what offset they have read until. There is never any confusion about different clock times on different servers, leap years, day light savings time, time zones, etc. It's kinda nice...

现在...说了这么多,如果您知道服务器在某个时间X发生故障,那么实际上,您真的想知道相应的偏移量.你可以靠近. kafka机器上的日志文件是根据它们开始写入的时间命名的,并且存在一个kafka工具(我现在找不到),可以让您知道与这些文件关联的偏移量.但是,如果您想知道确切的时间戳,则必须在发送给Kafka的消息中对时间戳进行编码.

NOW... all that said, if you know your server went down at some time X then practically speaking, you would really like to know the corresponding offset. You can get close. The log files on the kafka machines are named according to the time that they started writing, and there exists a kafka tool (that I can't find right now) that let's you know which offsets are associated with these files. If you want to know the exact timestamp though, then you must encode the timestamp in the messages that you're sending to Kafka.

这篇关于在Kafka中如何根据生产时间获取准确的偏移量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆