Flink 使用 Ceph 作为持久化存储 [英] Flink with Ceph as the persistent storage

查看:75
本文介绍了Flink 使用 Ceph 作为持久化存储的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Flink 文档表明 Ceph 可以用作状态的持久存储.https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/checkpointing.html

Flink documents suggests that Ceph can be used as a persistent storage for states. https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/stream/checkpointing.html

考虑到 Ceph 是一个事务型数据库,它会不会对 Flink 的性能产生不利影响?

Considering that Ceph is a transactional database, wouldn't it have adverse effect on Flink's performance?

推荐答案

Ceph 将自己描述为 "统一的分布式存储系统",并提供网络文件系统API.因此,它应该与 Flink 的状态后端无缝协作,将检查点持久化到远程文件系统.

Ceph describes itself as a "unified, distributed storage system" and provides a network file system API. As such, it such should be seamlessly working with Flink's state backends that persist checkpoints to a remote file system.

我不知道有人在使用 Ceph(HDFS 和 S3 更常用),也没有关于性能的信息.但是需要注意的是,Flink 可以异步写入检查点,这样存储系统的性能不会影响 Flink 应用程序的处理速度.但是,它可能会限制执行检查点的时间间隔.

I'm not aware of people using Ceph (HDFS and S3 are more commonly used) and have no information about the performance. However, note that Flink is able to write checkpoints asynchronously, such that the performance of the storage system does not affect the processing speed of a Flink application. It might however, constrain the interval in which checkpoints are taken.

更新:(2018 年 2 月)我注意到有多个用户在 Flink 的用户邮件列表中报告说他们正在使用 Ceph 和 Flink.

Update: (Feb. 2018) I noticed that multiple users reported on Flink's user mailing list that they are using Ceph with Flink.

更新 2:Flink 在 S3 协议和两者(Presto & Hadoop) Flink 的 S3 文件系统插件可以很好地使用它.

Update 2: Flink is working fine with S3 protocol and both (Presto & Hadoop) Flink's S3 FileSystem plugins are working fine with it.

这篇关于Flink 使用 Ceph 作为持久化存储的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆