在源主题和宿主题中,输入记录时间戳和输出记录时间戳是否相同? [英] input record timestamp and output record timestamp is same across both source and sink topics?

查看:129
本文介绍了在源主题和宿主题中,输入记录时间戳和输出记录时间戳是否相同?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用Processor API创建kafka流应用程序.

I create kafka streaming application using Processor API.

这是我创建主题以将时间戳记附加到所有传入消息的方式

Here is how i create a topic to attach timestamp to all incoming messages

kafka-topics.sh --create --zookeeper本地主机:2181 --replication-factor 1-分区1 --topic topicName --config message.timestamp.type = CreateTime

kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic topicName --config message.timestamp.type=CreateTime

工作流正在处理来自源主题的传入消息,并将其发布到接收器主题.由于某些奇怪的原因,我在源和接收主题消息中都看到了相同的时间戳. 例如,在消息的源主题中,创建时间为 T0 ,在接收器主题中也相同.

The workflow is processing the incoming messages from source topic and posting it to sink topic. For some strange reason, I have seen same timestamp coming in both source and sink topic messages. Say for ex, in source topic for a message create time is T0 , that remains same in sink topic as well.

我需要怎么做才能在接收器主题消息中查看更新的时间戳?

What do i need to do to see the updated timestamp in the sink topic messages?

推荐答案

如果使用CreateTime配置主题,则时间戳存储将是生产者提供的时间戳.

If you configure a topic with CreateTime the timestamp store will be the timestamp provided by the producer.

对于普通的KafkaProducer,您没有明确指定时间戳,KafkaProducer使用System.currentTimeMillis()并将消息发送给代理.

For a plain KafkaProducer is you don't specify the timestamp explicitly, KafkaProducer uses System.currentTimeMillis() and send the message to the broker.

对于Kafka Streams,如果您读取带有特定时间戳的输入记录,则我们将使用专门的时间戳推断逻辑来计算结果记录的时间戳.因此,Kafka Streams在将时间戳传递给内部使用的KafkaProducer时会显式设置时间戳,因此生产者仅使用此时间戳,而不使用当前的挂钟时间.对于流处理,这通常是期望的行为.

For Kafka Streams, if you read input records with certain timestamps, we have dedicate timestamp inference logic to compute the timestamp for the result records. Thus, Kafka Streams set the timestamp explicitly when handing it to the internally used KafkaProducer and thus the producer does just use this timestamp and does not use current wall-clock-time. For stream processing, this is usually desired behavior.

如果您有一个简单的管道,只是将数据从一个主题复制到另一个主题,则时间戳推断将使用输入记录时间戳作为输出记录时间戳.

If you have a simple pipeline that just copies data from one topic to another, the timestamp inference will use the input record timestamp as output record timestamp.

您可以做两件事来获得不同的语义:

There are two thing you can do to get different semantics:

  1. 为您的Kafka Streams应用程序配置WallClockTimestampExtractor.对于这种情况,Kafka Stream将不使用嵌入式记录时间戳,而是使用当前的挂钟时间来导出输出记录的时间戳.
  2. 使用AppendTime而不是CreateTime配置输出主题.在这种情况下,代理始终会用当前代理的挂钟时间覆盖生产者提供的记录时间戳.
  1. Configure WallClockTimestampExtractor for you Kafka Streams application. For this case, Kafka Stream will not use the embedded record timestamp but the current wall-clock time to derive the timestamp of the output record.
  2. Configure your output topic with AppendTime instead of CreateTime. For this case, the broker always overwrites the record timestamp provided by the producer with current broker wall-clock time.

这篇关于在源主题和宿主题中,输入记录时间戳和输出记录时间戳是否相同?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆