为什么Flink容器的vcore大小始终为1 [英] Why flink container vcore size is always 1
问题描述
我正在纱线上运行flink(更准确地说是在AWS EMR纱线集群中).
I am running flink on yarn(more precisely in AWS EMR yarn cluster).
我阅读了flink文档和源代码,这些文档和源代码默认为每个任务管理器容器使用,当从yarn请求资源时,flink将请求每个任务管理器的插槽数作为vcore数. 而且我还从源代码中确认:
I read flink document and source code that by default for each task manager container, flink will request the number of slot per task manager as the number of vcores when request resource from yarn. And I also confirmed from the source code:
// Resource requirements for worker containers
int taskManagerSlots = taskManagerParameters.numSlots();
int vcores = config.getInteger(ConfigConstants.YARN_VCORES,
Math.max(taskManagerSlots, 1));
Resource capability = Resource.newInstance(containerMemorySizeMB,
vcores);
resourceManagerClient.addContainerRequest(
new AMRMClient.ContainerRequest(capability, null, null,
priority));
当我使用 -yn 1 -ys 3 启动flink时,我假设yarn将为唯一的任务管理器容器分配3个vcore,但是当我检查yarn中每个容器的vcore数量时资源管理器Web ui中,我总是看到vcore的数量为1.从yarn资源管理器日志中也看到vcore为1.
When I use -yn 1 -ys 3 to start flink, I assume yarn will allocate 3 vcores for the only task manager container, but when I checked the number of vcores for each container from yarn resource manager web ui, I always see the number of vcores is 1. I also see vcore to be 1 from yarn resource manager logs.
我将flink源代码调试到下面粘贴的行中,并且看到 vcores 的值为 3 . 这真的使我感到困惑,任何人都可以帮我澄清一下,谢谢.
I debugged the flink source code to the line I pasted below, and I saw value of vcores is 3. This is really confuse me, can anyone help to clarify for me, thanks.
推荐答案
Kien Truong的答案
An answer from Kien Truong
您必须在YARN中启用 CPU调度,否则,它始终显示每个容器仅分配了1个CPU, 无论尝试分配多少Flink.因此,您应该在 capacity-scheduler.xml 中添加(编辑)以下属性:
You have to enable CPU scheduling in YARN, otherwise, it always shows that only 1 CPU is allocated for each container, regardless of how many Flink try to allocate. So you should add (edit) the following property in capacity-scheduler.xml:
<property>
<name>yarn.scheduler.capacity.resource-calculator</name>
<!-- <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculator</value> -->
<value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
</property>
TaskManager的内存例如为1400MB,但是Flink为堆外内存保留了一些内存,因此实际堆大小较小.
TaskManager memory is, for example, 1400MB, but Flink reserves some amount for off-heap memory, so the actual heap size is smaller.
这由2个设置控制:
containerized.heap-cutoff-min: default 600MB
containerized.heap-cutoff-ratio: default 15% of TM's memory
这就是为什么TM的堆大小限制为〜800MB(1400-600)
That's why your TM's heap size is limitted to ~800MB (1400 - 600)
此致
肯恩
这篇关于为什么Flink容器的vcore大小始终为1的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!