Apache Flink - 是否可以平均分配插槽共享组? [英] Apache Flink - is it possible to evenly distribute slot sharing groups?

查看:29
本文介绍了Apache Flink - 是否可以平均分配插槽共享组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个包含操作的管道,分为 2 个工作负载 - Source ->Transform 位于第一组并且是 CPU 密集型工作负载,它们被放入同一个插槽共享组中,比如说 source.而 Sink,RAM 密集型工作负载,因为它使用批量上传并在内存中保存大量数据.发送到sink槽共享组.

We have a pipeline with operations, split into 2 workloads - Source -> Transform are in a first group and are CPU-intensive workloads, they are put into the same slot sharing group, lets say source. And Sink, RAM-intensive workload, as it uses Bulk upload and holds amount of data in memory. It's sent to sink slot sharing group.

另外,我们有一个不同的并行级别Source ->Transform 工作负载和 Sink 工作负载作为第一个受到源并行性的限制.因此,例如,我们有 Source ->Transform 并行度为 50,同时 Sink 并行度为 78.我们有 8 个 TM,每个有 16 个内核(因此也有插槽).

Additionally, we have a different parallelism level of Source -> Transform workload and Sink workload as the first one is limited by source parallelism. So, for example, we have Source -> Transform parallelism of 50, meanwhile Sink parallelism equal to 78. And we have 8 TMs, each with 16 cores (and therefore slots).

在这种情况下,我们理想的插槽分配策略似乎是在每个 TM 上为 Source -> 分配 6-7 个插槽.转换,其余的 - Sink 领先的 CPU-RAM 工作负载大致均匀地分布在所有 TM 上.

In this case, the ideal slots allocation strategy for us seems to be allocating 6-7 slots on each TM for Source -> Transform, and the rest - for Sink leading CPU-RAM workloads to be roughly evenly distributed across all TMs.

那么,我想知道是否有一些配置设置可以告诉均匀分配插槽共享组?

So, I wonder whether there is some config setting which will tell to distribute slot sharing groups evenly ?

我只找到了 cluster.evenly-spread-out-slots 配置参数,但我不确定它是否真的均匀分布插槽共享组,而不仅仅是插槽 - 例如,我得到 10 个 的 TM来源 ->Transform 任务同时我希望有 6 或 7 个.

I only found cluster.evenly-spread-out-slots config parameter, but I'm not sure whether it actually evenly distributes slot sharing groups, not only slots - for example, I get TMs with 10 Source -> Transform tasks meanwhile I would expect 6 or 7.

那么,问题是是否可以告诉 Flink 在集群中均匀分布插槽共享组?或者可能还有其他可能性吗?

So, the question is whether it is possible to tell Flink to dsitribute slot sharing groups evenly across cluster ? Or probably there is any other possibility to do it ?

在任务管理器之间均匀分布一个 Flink 算子 似乎有点相似我的问题,但我主要是询问插槽共享组分布.本主题还仅包含使用 cluster.evenly-spread-out-slots 但从那以后可能发生了一些变化.

Distribute a Flink operator evenly across taskmanagers seems a bit similar to my question, but I'm mostly asking about slot sharing groups distribution. This topic also contains only suggestion of using cluster.evenly-spread-out-slots but probably something has changed since then.

推荐答案

我找到了一种解决方法来让槽共享组均匀分布.

I was able to find a workaround to get the even distribution of slot sharing groups.

从flink 1.9.2开始,引入了偶任务分发功能,可以通过flink-conf中的cluster.evenly-spread-out-slots: true开启.yaml:FLINK-12122 在所有可用的已注册 TaskManager 中均匀分布任务.我试图启用它,但它不起作用.经过一番挖掘,我设法找到了开发人员的评论,该评论指出此功能仅在独立模式下有效,因为它需要预先分配资源 - https://issues.apache.org/jira/browse/FLINK-12122?focusedCommentId=17013089&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17013089":

Starting from flink 1.9.2, even tasks distribution feature has been introduced, which can be turned on via cluster.evenly-spread-out-slots: true in the flink-conf.yaml: FLINK-12122 Spread out tasks evenly across all available registered TaskManagers. I tried to enable it and it didn't work. After digging a bit, I managed to find the developer's comment which stated that this feature works only in standalone mode as it requires resources to be preliminary pre-allocated - https://issues.apache.org/jira/browse/FLINK-12122?focusedCommentId=17013089&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17013089":

该功能仅保证在调度时注册的一组 TM 中分散任务.因此,当您使用活动 Yarn 模式并提交第一个作业时,将不会注册任何 TM.因此,Flink 将分配第一个容器,将其填满,然后仅分配一个新容器.但是,如果您以独立模式启动 Flink,或者在 Yarn 上完成第一个作业后仍有一些 TM 注册,那么下一个作业将被分散.

the feature only guarantees spreading out tasks across the set of TMs which are registered at the time of scheduling. Hence, when you are using the active Yarn mode and submit the first job, then there won't be any TMs registered. Consequently, Flink will allocate the first container, fill it up and then only allocate a new container. However, if you start Flink in standalone mode or after your first job finishes on Yarn there are still some TMs registered, then the next job would be spread out.

所以,想法是开始 一个分离的纱线会话,增加了空闲容器超时设置,首先提交一些短暂的假作业,它会简单地从 YARN 获取所需数量的资源并完成,然后立即启动主要管道将分配给已分配的容器,在这种情况下,cluster.evenly-spread-out-slots: true 可以解决问题并平均分配所有插槽共享组.

So, the idea is to start a detached yarn session with the increased idle containers timeout setting, first submit some short living fake job, which will simply acquires the required amount of resources from YARN and completes, and then start immediately the main pipeline which will be assigned to already allocated containers and in this case the cluster.evenly-spread-out-slots: true does the trick and distributes all slot sharing groups evenly.

所以,总而言之,为了在作业中获得均匀分布的插槽共享组,做了以下工作:

So, to sum up, the following was done to get the evenly distributed slot sharing groups within the job:

  1. resourcemanager.taskmanager-timeout 已增加,以允许在为空闲任务管理器释放容器之前提交主作业.我将其增加到 1 分钟,这就足够了.
  2. 启动一个 yarn-session 并动态地向它提交作业.
  3. 调整了主作业,使其首先调用一个简单地分配资源的假作业.在我的例子中,这个简单的代码在配置主管道之前就完成了:
  1. resourcemanager.taskmanager-timeout was increased to allow the main job be submitted before the container released for an idle task manager. I increased this to 1 minute and this was more then enough.
  2. started a yarn-session and submitted job dynamically to it.
  3. tweaked the main job to call first for a fake job which simply allocates the resources. In my case, this simple code does the trick before configuring the main pipeline:

val env = StreamExecutionEnvironment.getExecutionEnvironment

val job = env
    .fromElements(0)
    .map { x =>
        x * 2
    }
    .setParallelism(parallelismMax)
    .print()

val jobResult = env.execute("Resources pre-allocation job")
println(jobResult)

print("Done. Starting main job!")

这篇关于Apache Flink - 是否可以平均分配插槽共享组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆