在 Apache Kafka Streams 中的特定分区上聚合 [英] Aggregation over a specific partition in Apache Kafka Streams

查看:30
本文介绍了在 Apache Kafka Streams 中的特定分区上聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个名为 SensorData 的 Kafka 主题,两个传感器 S1 和 S2 将数据(时间戳和值)发送到两个不同的分区,例如S1 -> P1 和 S2 -> P2.现在我需要分别聚合这两个传感器的值,假设计算 1 小时时间窗口内的平均传感器值并将其写入新主题 SensorData1Hour.在这种情况下

Lets say I have a Kafka topic named SensorData to which two sensors S1 and S2 are sending data (timestamp and value) to two different partitions e.g. S1 -> P1 and S2 -> P2. Now I need to aggregate the values for these two sensors separately, lets say calculating the average sensor value over a time window of 1 hour and writing it into a new topic SensorData1Hour. With this scenario

  1. 如何使用 KStreamBuilder#stream 方法选择特定主题分区?
  2. 是否可以在来自同一主题的两个(多个)不同分区上应用一些聚合函数?
  1. How can I select a specific topic partition using the KStreamBuilder#stream method?
  2. Is it possible to apply some aggregation function over two (multiple) different partitions from same topic?

推荐答案

您不能(直接)访问单个分区,也不能(直接)在多个分区上应用聚合函数.

You cannot (directly) access single partitions and you cannot (directly) apply an aggregation function over multiple partitions.

聚合总是按 key 完成:http://docs.confluent.io/current/streams/developer-guide.html#stateful-transformations

  1. 因此,您可以为每个分区使用不同的键,而不是按键聚合.请参阅 http://docs.confluent.io/current/streams/developer-guide.html#windowing-a-stream

最简单的方法是让您的每个生产者立即将密钥应用于每条消息.

The simplest way is to let each of your producers apply a key to each message right away.

  1. 如果要聚合多个分区,首先需要设置一个新键(例如,使用 selectKey())并为要聚合的所有数据设置相同的键(如果需要)要聚合所有分区,您将使用单个键值——但是,请记住,这可能很快成为瓶颈!).
  1. If you want to aggregate multiple partitions, you first need to set a new key (e.g., using selectKey()) and set the same key for all data you want to aggregate (if you want to aggregate all partitions, you would use a single key value -- however, keep in mind, this might quickly become a bottleneck!).

这篇关于在 Apache Kafka Streams 中的特定分区上聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆