Kafka Streams:标点与流程 [英] Kafka Streams: Punctuate vs Process

查看:45
本文介绍了Kafka Streams:标点与流程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在流应用程序中的单个任务中,以下两种方法是否独立运行(这意味着方法"在处理来自上游源的传入消息时,方法标点"也可以根据是否将指定的日程表和WALL_CLOCK_TIME指定为PunctuationType?)还是它们共享同一个线程,以便它在给定时间运行,如果是这样,如果process方法不断不断地从上游源获取消息,那么就不会调用标点方法吗?>

  • Processor.process(K键,V值)
    用给定的键和值处理记录.

  • ProcessorContext.schedule(长间隔,PunctuationType类型,Punctuator回调)
    安排处理器的定期操作.

另外,请澄清标点方法中分区id值为-1是什么意思.标点方法不是特定于任何分区的吗?

  • int ProcessorContext.partition()
    返回当前输入记录的分区标识;如果不可用,则可以为-1(例如,如果从标点调用中调用此方法)

解决方案

两个方法都在单个线程中执行.如果有输入数据,则将基于壁钟的 punctuate()进行独立调用:在调用 process()之间,线程将检查系统时间并调用如有必要,请punate().

有关分区信息:是的,标点符号与分区无关.当然,标点符号是特定于任务的,但是,任务可能具有多个输入分区(例如,如果它执行 merge join ),则不清楚是哪个分区要传递的信息.为简单起见,单分区大小写与多分区大小写的处理方式相同,标点符号与分区分离.

In a single task within the stream app, does the following two methods run independently (meaning while the method "process" is handling an incoming message from the upstream source, the method "punctuate" can also run in parallel based on the specified schedule and WALL_CLOCK_TIME as the PunctuationType?) OR do they share same thread so it's either one that runs at a given time, if so would the punctuate method never gets invoked if the process method keeps continuously getting messages from the upstream source?

  • Processor.process(K key, V value)
    Process the record with the given key and value.

  • ProcessorContext.schedule(long interval, PunctuationType type, Punctuator callback)
    Schedules a periodic operation for processors.

Also, please clarify what does it mean by partition id value being -1 in punctuate method. Is punctuate method not specific to any partition?

  • int ProcessorContext.partition()
    Returns the partition id of the current input record; could be -1 if it is not available (for example, if this method is invoked from the punctuate call)

解决方案

Both methods are executed in a single thread. Wall-clock based punctuate() will be called independently if there is input data or not: Between calls to process() the thread checks the system time and calls punctuate() if necessary.

For the partition information: yes, punctuations are independent of partitions. Of course, punctuations are specific to a task, however, a task might have multiple input partitions (for example, if it executes a merge or join) so it's unclear what partition information to pass in. For simplicity, single partition case is treated the same way as multi-partition case and punctuations are decouples from partitions.

这篇关于Kafka Streams:标点与流程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆