源和接收器主题的输入记录时间戳和输出记录时间戳是否相同? [英] input record timestamp and output record timestamp is same across both source and sink topics?

查看:36
本文介绍了源和接收器主题的输入记录时间戳和输出记录时间戳是否相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用处理器 API 创建了 kafka 流应用程序.

I create kafka streaming application using Processor API.

这是我如何创建一个主题以将时间戳附加到所有传入消息

Here is how i create a topic to attach timestamp to all incoming messages

kafka-topics.sh --create --zookeeper 本地主机:2181--replication-factor 1 --partitions 1 --topic topicName --config message.timestamp.type=CreateTime

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topicName --config message.timestamp.type=CreateTime

工作流正在处理来自源主题的传入消息并将其发布到接收器主题.出于某种奇怪的原因,我在源和接收器主题消息中看到了相同的时间戳.例如,在源主题中消息创建时间是 T0 ,在接收器主题中也保持不变.

The workflow is processing the incoming messages from source topic and posting it to sink topic. For some strange reason, I have seen same timestamp coming in both source and sink topic messages. Say for ex, in source topic for a message create time is T0 , that remains same in sink topic as well.

我需要做什么才能在接收器主题消息中看到更新的时间戳?

What do i need to do to see the updated timestamp in the sink topic messages?

推荐答案

如果您使用 CreateTime 配置主题,则时间戳存储将是生产者提供的时间戳.

If you configure a topic with CreateTime the timestamp store will be the timestamp provided by the producer.

对于普通的 KafkaProducer 是您没有明确指定时间戳,KafkaProducer 使用 System.currentTimeMillis() 并将消息发送到经纪人.

For a plain KafkaProducer is you don't specify the timestamp explicitly, KafkaProducer uses System.currentTimeMillis() and send the message to the broker.

对于 Kafka Streams,如果您读取具有特定时间戳的输入记录,我们有专门的时间戳推理逻辑来计算结果记录的时间戳.因此,Kafka Streams 在将时间戳传递给内部使用的 KafkaProducer 时显式设置时间戳,因此生产者只使用这个时间戳而不使用当前挂钟时间.对于流处理,这通常是期望的行为.

For Kafka Streams, if you read input records with certain timestamps, we have dedicate timestamp inference logic to compute the timestamp for the result records. Thus, Kafka Streams set the timestamp explicitly when handing it to the internally used KafkaProducer and thus the producer does just use this timestamp and does not use current wall-clock-time. For stream processing, this is usually desired behavior.

如果您有一个简单的管道,它只是将数据从一个主题复制到另一个主题,则时间戳推理将使用输入记录时间戳作为输出记录时间戳.

If you have a simple pipeline that just copies data from one topic to another, the timestamp inference will use the input record timestamp as output record timestamp.

你可以做两件事来获得不同的语义:

There are two thing you can do to get different semantics:

  1. 为您的 Kafka Streams 应用程序配置 WallClockTimestampExtractor.对于这种情况,Kafka Stream 不会使用嵌入的记录时间戳,而是使用当前挂钟时间来导出输出记录的时间戳.
  2. 使用 AppendTime 而不是 CreateTime 配置您的输出主题.对于这种情况,代理始终使用当前代理挂钟时间覆盖生产者提供的记录时间戳.
  1. Configure WallClockTimestampExtractor for you Kafka Streams application. For this case, Kafka Stream will not use the embedded record timestamp but the current wall-clock time to derive the timestamp of the output record.
  2. Configure your output topic with AppendTime instead of CreateTime. For this case, the broker always overwrites the record timestamp provided by the producer with current broker wall-clock time.

这篇关于源和接收器主题的输入记录时间戳和输出记录时间戳是否相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆