Kafka Streams-跳窗-重复数据删除键 [英] Kafka Streams - Hopping windows - deduplicate keys

查看:88
本文介绍了Kafka Streams-跳窗-重复数据删除键的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在每5分钟前进4小时的窗口上进行跳跃窗口聚合.由于跳跃窗口是重叠的,所以我得到了具有不同聚合值的重复键.

I'm doing a hopping window aggregation on a 4 hr window advancing every 5 mins. As the hopping windows are overlapping, I'm getting duplicate keys with different aggregated value.

TimeWindows.of(240 * 60 * 1000L).advanceBy(5 * 60* 1000L)

如何通过重复数据消除重复的键或仅选择具有最新值的键.

How do I eliminate duplicate keys with repeating data or pick only the keys that holds the latest value.

推荐答案

如果我对您的理解正确,那么这是预期的行为.您没有看到重复"密钥,但是看到了同一密钥的连续更新.

If I understand you correctly, then this is expected behavior. You are not seeing "duplicate" keys, but you see continuous updates for the same key.

思考:

# Extreme case: record caches disabled (size set to 0)
alice->1, alice->2, alice->3, alice->4, ..., alice->100, ...

# With record cache enabled, you would see sth like this.
alice->23, alice->59, alice->100, ...

http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-memory-management ,其中对此进行了详细说明.如果要减少每个记录键的重复项",可以通过应用程序配置中的cache.max.bytes.buffering aka StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG来增加记录缓存的大小(使用DSL时).与commit.interval.ms也有相互作用.

Take a look at the explanation at http://docs.confluent.io/current/streams/developer-guide.html#streams-developer-guide-memory-management, which describes this in more detail. If you want to see less "duplicates" per record key, you can increase the size of record caches (when using the DSL) via cache.max.bytes.buffering aka StreamsConfig.CACHE_MAX_BYTES_BUFFERING_CONFIG in your application's configuration. There's also an interplay with commit.interval.ms.

如果您想知道"Kafka Streams API为何首先表现为这种方式",我建议您发布博客文章

If you are wondering "why does the Kafka Streams API behave in this way in the first place", I'd recommend the blog post https://www.confluent.io/blog/watermarks-tables-event-time-dataflow-model/ that was published earlier this week.

这篇关于Kafka Streams-跳窗-重复数据删除键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆