为什么 flink 容器 vcore 大小总是 1 [英] Why flink container vcore size is always 1

查看:69
本文介绍了为什么 flink 容器 vcore 大小总是 1的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在纱线上运行 flink(更准确地说是在 AWS EMR 纱线集群中).

I am running flink on yarn(more precisely in AWS EMR yarn cluster).

我阅读了 flink 文档和源代码,默认情况下,对于每个任务管理器容器,flink 将请求每个任务管理器的插槽数作为从纱线请求资源时的 vcore 数.而且我也从源代码中确认:

I read flink document and source code that by default for each task manager container, flink will request the number of slot per task manager as the number of vcores when request resource from yarn. And I also confirmed from the source code:

// Resource requirements for worker containers
            int taskManagerSlots = taskManagerParameters.numSlots();
            int vcores = config.getInteger(ConfigConstants.YARN_VCORES, 
Math.max(taskManagerSlots, 1));
            Resource capability = Resource.newInstance(containerMemorySizeMB, 
vcores);

            resourceManagerClient.addContainerRequest(
                new AMRMClient.ContainerRequest(capability, null, null, 
priority));

当我使用 -yn 1 -ys 3 启动 flink 时,我假设 yarn 将为唯一的任务管理器容器分配 3 个 vcore,但是当我从 yarn 检查每个容器的 vcore 数量时资源管理器 web ui,我总是看到 vcore 的数量是 1.我也从纱线资源管理器日志中看到 vcore 为 1.

When I use -yn 1 -ys 3 to start flink, I assume yarn will allocate 3 vcores for the only task manager container, but when I checked the number of vcores for each container from yarn resource manager web ui, I always see the number of vcores is 1. I also see vcore to be 1 from yarn resource manager logs.

我将 flink 源代码调试到下面粘贴的行,我看到 vcores 的值是 3.这真的让我很困惑,谁能帮我澄清一下,谢谢.

I debugged the flink source code to the line I pasted below, and I saw value of vcores is 3. This is really confuse me, can anyone help to clarify for me, thanks.

推荐答案

来自 Kien Truong 的回答

An answer from Kien Truong

必须开启YARN中的CPU调度,否则总是显示每个容器只分配了1个CPU,无论 Flink 尝试分配多少.因此,您应该在 capacity-scheduler.xml 中添加(编辑)以下属性:

You have to enable CPU scheduling in YARN, otherwise, it always shows that only 1 CPU is allocated for each container, regardless of how many Flink try to allocate. So you should add (edit) the following property in capacity-scheduler.xml:

<property>
 <name>yarn.scheduler.capacity.resource-calculator</name>
 <!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> -->
 <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>

以TaskManager内存为例,1400MB,但Flink预留了一定的堆外内存,所以实际的堆大小更小.

TaskManager memory is, for example, 1400MB, but Flink reserves some amount for off-heap memory, so the actual heap size is smaller.

这由 2 个设置控制:

This is controlled by 2 settings:

containerized.heap-cutoff-min: default 600MB

containerized.heap-cutoff-ratio: default 15% of TM's memory

这就是为什么您的 TM 的堆大小被限制为 ~800MB (1400 - 600)

That's why your TM's heap size is limitted to ~800MB (1400 - 600)

问候,

基恩

这篇关于为什么 flink 容器 vcore 大小总是 1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆