Apache Flink-是否可以平均分配插槽共享组? [英] Apache Flink - is it possible to evenly distribute slot sharing groups?

查看:125
本文介绍了Apache Flink-是否可以平均分配插槽共享组?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有一个包含操作的管道,分为2个工作负载- Source->转换在第一组中,并且是占用大量CPU的工作负载,它们被放在同一个插槽共享组中,比如说 source .还有 Sink ,这是占用大量RAM的工作量,因为它使用批量上传并在内存中保存大量数据.它已发送到 sink 插槽共享组.

We have a pipeline with operations, split into 2 workloads - Source -> Transform are in a first group and are CPU-intensive workloads, they are put into the same slot sharing group, lets say source. And Sink, RAM-intensive workload, as it uses Bulk upload and holds amount of data in memory. It's sent to sink slot sharing group.

此外,我们具有 Source->不同的并行度.转换工作负载和 Sink 工作负载是第一个受源并行性限制的工作.因此,例如,我们有 Source->转换并行度为50,而并行接收度为78.我们有8个TM,每个TM具有16个内核(因此有插槽).

Additionally, we have a different parallelism level of Source -> Transform workload and Sink workload as the first one is limited by source parallelism. So, for example, we have Source -> Transform parallelism of 50, meanwhile Sink parallelism equal to 78. And we have 8 TMs, each with 16 cores (and therefore slots).

在这种情况下,对我们来说,理想的时隙分配策略似乎是在每个TM上为 Source->分配6-7个时隙.转换,其余的-用于使 Sink 领先的CPU-RAM工作负载大致均匀地分布在所有TM之间.

In this case, the ideal slots allocation strategy for us seems to be allocating 6-7 slots on each TM for Source -> Transform, and the rest - for Sink leading CPU-RAM workloads to be roughly evenly distributed across all TMs.

那么,我想知道是否有一些配置设置可以告诉您平均分配插槽共享组吗?

So, I wonder whether there is some config setting which will tell to distribute slot sharing groups evenly ?

我仅找到

I only found cluster.evenly-spread-out-slots config parameter, but I'm not sure whether it actually evenly distributes slot sharing groups, not only slots - for example, I get TMs with 10 Source -> Transform tasks meanwhile I would expect 6 or 7.

因此,问题是是否可以告诉Flink在整个群集中平均分配插槽共享组?还是可能有其他可能性?

So, the question is whether it is possible to tell Flink to dsitribute slot sharing groups evenly across cluster ? Or probably there is any other possibility to do it ?

在任务管理器之间平均分配Flink运算符似乎有点相似我的问题,但我主要是在询问广告位共享组的分配.本主题还仅包含使用群集.均匀分布的插槽,但此后可能已经发生了变化.

Distribute a Flink operator evenly across taskmanagers seems a bit similar to my question, but I'm mostly asking about slot sharing groups distribution. This topic also contains only suggestion of using cluster.evenly-spread-out-slots but probably something has changed since then.

推荐答案

我能够找到一种解决方法,以使插槽共享组平均分配.

I was able to find a workaround to get the even distribution of slot sharing groups.

从flink 1.9.2开始,甚至引入了任务分配功能,可以通过 flink-conf中的 cluster.evenly-spread-out-slots:true 打开该功能..yaml : FLINK-12122在所有可用的已注册TaskManager中平均分配任务.我试图启用它,但是它没有用.经过深入研究后,我设法找到了开发人员的评论,其中指出该功能仅在独立模式下有效,因为它需要对资源进行初步的预分配-

Starting from flink 1.9.2, even tasks distribution feature has been introduced, which can be turned on via cluster.evenly-spread-out-slots: true in the flink-conf.yaml: FLINK-12122 Spread out tasks evenly across all available registered TaskManagers. I tried to enable it and it didn't work. After digging a bit, I managed to find the developer's comment which stated that this feature works only in standalone mode as it requires resources to be preliminary pre-allocated - https://issues.apache.org/jira/browse/FLINK-12122?focusedCommentId=17013089&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17013089":

此功能仅保证在计划时注册的所有TM上分散任务.因此,当您使用活动的纱线模式并提交第一份作业时,将不会注册任何TM.因此,Flink将分配第一个容器,填满它,然后仅分配一个新容器.但是,如果您以独立模式启动Flink,或者在Yarn上完成第一项工作后,仍然注册了一些TM,那么下一项工作将散布开来.

the feature only guarantees spreading out tasks across the set of TMs which are registered at the time of scheduling. Hence, when you are using the active Yarn mode and submit the first job, then there won't be any TMs registered. Consequently, Flink will allocate the first container, fill it up and then only allocate a new container. However, if you start Flink in standalone mode or after your first job finishes on Yarn there are still some TMs registered, then the next job would be spread out.

因此,我们的想法是开始独立的纱线会话,其中闲置的容器超时设置增加,首先提交一些短暂的伪造工作,这将简单地从YARN中获取所需的资源并完成,然后立即开始将分配给已经分配的容器的管道,在这种情况下,为 cluster.evenly-spread-out-slots:true 起作用,并且将所有插槽共享组平均分配.

So, the idea is to start a detached yarn session with the increased idle containers timeout setting, first submit some short living fake job, which will simply acquires the required amount of resources from YARN and completes, and then start immediately the main pipeline which will be assigned to already allocated containers and in this case the cluster.evenly-spread-out-slots: true does the trick and distributes all slot sharing groups evenly.

因此,总而言之,执行以下操作以获取作业中均匀分布的插槽共享组:

So, to sum up, the following was done to get the evenly distributed slot sharing groups within the job:

  1. resourcemanager.taskmanager-timeout 已增加,以允许在为空闲任务管理器释放容器之前提交主要作业.我将其增加到1分钟,这足够了.
  2. 开始一个 yarn-session 并动态地向其提交作业.
  3. 调整了主要作业,以便首先调用仅分配资源的虚假作业.就我而言,这个简单的代码可以在配置主管道之前达到目的:
  1. resourcemanager.taskmanager-timeout was increased to allow the main job be submitted before the container released for an idle task manager. I increased this to 1 minute and this was more then enough.
  2. started a yarn-session and submitted job dynamically to it.
  3. tweaked the main job to call first for a fake job which simply allocates the resources. In my case, this simple code does the trick before configuring the main pipeline:

val env = StreamExecutionEnvironment.getExecutionEnvironment

val job = env
    .fromElements(0)
    .map { x =>
        x * 2
    }
    .setParallelism(parallelismMax)
    .print()

val jobResult = env.execute("Resources pre-allocation job")
println(jobResult)

print("Done. Starting main job!")

这篇关于Apache Flink-是否可以平均分配插槽共享组?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆