Kafka KStream 应用程序 - 临时文件清理 [英] Kafka KStream application - temp file cleanup

查看:24
本文介绍了Kafka KStream 应用程序 - 临时文件清理的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

似乎我的基于 KStream 的应用程序堆积了许多 GB 的文件(.sst、Log.old. 等).

Seems that my KStream based application has been piling up many gBs of files (.sst, Log.old.<stamp>, etc).

这些会自行清理还是我需要密切关注?要设置一些参数来剔除它们?

Will these get cleaned up on their own or is this something I need to keep an eye on? Some param to be set to cull them?

推荐答案

关于这些本地/临时文件:其中一些文件是应用程序状态,它们应该占消耗的大部分空间.您的应用程序可能会堆积"许多 GB 的文件,这仅仅是因为您的应用程序实际上正在管理大量状态.如果删除这些文件,可以通过从 Kafka 重放状态的更改日志来(自动)重建它们,但这可能需要一些时间.

About these local/temp files: Some of these files are application state, and those should account for the majority of space consumed. Your application may be "piling up" many GBs of files simply because your application is actually managing a lot of state. These files can be reconstructed (automatically) by replaying the state's changelog from Kafka if you delete them, but this may take some time.

这些会自行清理还是我需要密切关注?要设置一些参数来剔除它们?

Will these get cleaned up on their own or is this something I need to keep an eye on? Some param to be set to cull them?

一些清理工作已经完成,但正如我上面所写的,这些文件很可能是出于某种原因占用了该空间.也许您可以分享应用处理拓扑的片段以及有关应用处理数据的一些信息,这可能有助于了解消耗的空间是否正确或是否存在问题.

Some cleaning up is done, but as I wrote above most probably the files consume that space for a reason. Perhaps you can share a snippet of the app's processing topology as well as some info about the data the app processing, which might help to understand whether the consumed space seems about right or whether there might be an issue.

清理:最新版本的 Kafka (0.10.0.1) 现在附带了一个用于 Kafka Streams 的应用程序重置工具以及一些帮助清理/重置的随附 API 方法,请参阅 使用 Kafka Streams 重新处理数据:重置 Streams 应用程序.也就是说,我不确定您是否打算清理文件是因为您已经停止了应用程序并想要清除所有本地数据,还是因为您想要在应用程序仍在运行时进行一些垃圾收集".如果是关于后者 (GC),那么通常没有必要 -- 这些文件存在是有充分理由的,而且很可能只会重新创建.

Clean up: The latest version of Kafka (0.10.0.1) now ships with an application reset tool for Kafka Streams plus some accompanying API methods that help cleaning/resetting, see Data Reprocessing with Kafka Streams: Resetting a Streams Application. That said, I am not sure whether you are intending to clean up files because you have stopped the application and want to get rid of all the local data, or because you want to do some "garbage collection" while the app is still running. If it's about the latter (GC), then in general there's no need to -- the files are there for a good reason, and most probably will just be recreated.

这篇关于Kafka KStream 应用程序 - 临时文件清理的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆