Kafka Streams:标点与流程 [英] Kafka Streams: Punctuate vs Process

查看:21
本文介绍了Kafka Streams:标点与流程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在流应用程序中的单个任务中,以下两个方法是否独立运行(意味着当方法process"处理来自上游源的传入消息时,方法punctuate"也可以基于指定的时间表和 WALL_CLOCK_TIME 作为 PunctuationType?)或者它们共享同一个线程,所以它是在给定时间运行的一个线程,如果是这样,如果 process 方法不断从上游源获取消息,那么 punctuate 方法将永远不会被调用?

  • Processor.process(K key, V value)
    使用给定的键和值处理记录.

  • ProcessorContext.schedule(long interval, PunctuationType type, Punctuator callback)
    为处理器安排定期操作.

另外,请说明在 punctuate 方法中分区 id 值为 -1 是什么意思.punctuate 方法不是特定于任何分区吗?

  • int ProcessorContext.partition()
    返回当前输入记录的分区id;如果它不可用,则可能是 -1(例如,如果从标点调用中调用此方法)

解决方案

这两种方法都在一个线程中执行.如果有输入数据与否,将独立调用基于挂钟的 punctuate():在调用 process() 之间,线程检查系统时间并调用 punctuate() 如有必要.

对于分区信息:是的,标点符号与分区无关.当然,标点符号是特定于任务的,但是,一个任务可能有多个输入分区(例如,如果它执行 mergejoin),所以不清楚是哪个分区要传入的信息.为简单起见,单分区情况与多分区情况的处理方式相同,标点符号与分区分离.

In a single task within the stream app, does the following two methods run independently (meaning while the method "process" is handling an incoming message from the upstream source, the method "punctuate" can also run in parallel based on the specified schedule and WALL_CLOCK_TIME as the PunctuationType?) OR do they share same thread so it's either one that runs at a given time, if so would the punctuate method never gets invoked if the process method keeps continuously getting messages from the upstream source?

  • Processor.process(K key, V value)
    Process the record with the given key and value.

  • ProcessorContext.schedule(long interval, PunctuationType type, Punctuator callback)
    Schedules a periodic operation for processors.

Also, please clarify what does it mean by partition id value being -1 in punctuate method. Is punctuate method not specific to any partition?

  • int ProcessorContext.partition()
    Returns the partition id of the current input record; could be -1 if it is not available (for example, if this method is invoked from the punctuate call)

解决方案

Both methods are executed in a single thread. Wall-clock based punctuate() will be called independently if there is input data or not: Between calls to process() the thread checks the system time and calls punctuate() if necessary.

For the partition information: yes, punctuations are independent of partitions. Of course, punctuations are specific to a task, however, a task might have multiple input partitions (for example, if it executes a merge or join) so it's unclear what partition information to pass in. For simplicity, single partition case is treated the same way as multi-partition case and punctuations are decouples from partitions.

这篇关于Kafka Streams:标点与流程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆