在Apache Kafka Streams中的特定分区上聚合 [英] Aggregation over a specific partition in Apache Kafka Streams

查看:135
本文介绍了在Apache Kafka Streams中的特定分区上聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

让我们说我有一个名为SensorData的Kafka主题,两个传感器S1和S2正在向其中发送数据(时间戳和值)到两个不同的分区,例如. S1-> P1和S2-> P2.现在,我需要分别汇总这两个传感器的值,比方说,在1小时的时间范围内计算平均传感器值,并将其写入新主题SensorData1Hour中.在这种情况下

Lets say I have a Kafka topic named SensorData to which two sensors S1 and S2 are sending data (timestamp and value) to two different partitions e.g. S1 -> P1 and S2 -> P2. Now I need to aggregate the values for these two sensors separately, lets say calculating the average sensor value over a time window of 1 hour and writing it into a new topic SensorData1Hour. With this scenario

  1. 如何使用KStreamBuilder#stream方法选择特定的主题分区?
  2. 是否可以在同一主题的两个(多个)不同分区上应用某些聚合功能?
  1. How can I select a specific topic partition using the KStreamBuilder#stream method?
  2. Is it possible to apply some aggregation function over two (multiple) different partitions from same topic?

推荐答案

您不能(直接)访问单个分区,也不能(直接)将聚合函数应用于多个分区.

You cannot (directly) access single partitions and you cannot (directly) apply an aggregation function over multiple partitions.

汇总始终按照key进行: http://docs.confluent.io/current/streams/developer-guide.html#stateful-transformations

  1. 因此,您可以为每个分区使用不同的键,而不是按键进行聚合.请参见 http://docs.confluent.io/current/streams/developer-guide.html#windowing-a-stream

最简单的方法是让每个生产者立即将密钥应用于每个消息.

The simplest way is to let each of your producers apply a key to each message right away.

  1. 如果要聚合多个分区,则首先需要设置一个新密钥(例如,使用selectKey()),并为要聚合的所有数据设置相同的密钥(如果要聚合所有分区,则需要使用单个键值-但是,请记住,这可能很快会成为瓶颈!).
  1. If you want to aggregate multiple partitions, you first need to set a new key (e.g., using selectKey()) and set the same key for all data you want to aggregate (if you want to aggregate all partitions, you would use a single key value -- however, keep in mind, this might quickly become a bottleneck!).

这篇关于在Apache Kafka Streams中的特定分区上聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆