MapReduce2中如何基于vcores和memory创建容器? [英] How are containers created based on vcores and memory in MapReduce2?

查看:17
本文介绍了MapReduce2中如何基于vcores和memory创建容器?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由 1 个主节点(namenode、secondarynamenode、resourcemanager)和 2 个从节点(datanode、nodemanager)组成的小型集群.

I have a tiny cluster composed of 1 master (namenode, secondarynamenode, resourcemanager) and 2 slaves (datanode, nodemanager).

我已经在master的yarn-site.xml中设置了:

I have set in the yarn-site.xml of the master :

  • yarn.scheduler.minimum-allocation-mb:512
  • yarn.scheduler.maximum-allocation-mb:1024
  • yarn.scheduler.minimum-allocation-vcores : 1
  • yarn.scheduler.maximum-allocation-vcores : 2
  • yarn.scheduler.minimum-allocation-mb : 512
  • yarn.scheduler.maximum-allocation-mb : 1024
  • yarn.scheduler.minimum-allocation-vcores : 1
  • yarn.scheduler.maximum-allocation-vcores : 2

我已经在 slaves 的 yarn-site.xml 中设置了:

I have set in the yarn-site.xml of the slaves :

  • yarn.nodemanager.resource.memory-mb:2048
  • yarn.nodemanager.resource.cpu-vcores:4
  • yarn.nodemanager.resource.memory-mb : 2048
  • yarn.nodemanager.resource.cpu-vcores : 4

然后在master中,我已经在mapred-site.xml中设置了:

Then in the master, I have set in mapred-site.xml :

  • mapreduce.map.memory.mb:512
  • mapreduce.map.java.opts : -Xmx500m
  • mapreduce.map.cpu.vcores : 1
  • mapreduce.reduce.memory.mb:512
  • mapreduce.reduce.java.opts : -Xmx500m
  • mapreduce.reduce.cpu.vcores:1
  • mapreduce.map.memory.mb : 512
  • mapreduce.map.java.opts : -Xmx500m
  • mapreduce.map.cpu.vcores : 1
  • mapreduce.reduce.memory.mb : 512
  • mapreduce.reduce.java.opts : -Xmx500m
  • mapreduce.reduce.cpu.vcores : 1

所以我的理解是,在运行作业时,mapreduce ApplicationMaster 将尝试在两个从属服务器上创建尽可能多的 512 Mb 和 1 个 vCore 的容器,每个容器只有 2048 Mb 和 4 个 vCore,从而为 4每个从站上的容器.这正是我工作中发生的事情,所以到目前为止没有问题.

So it is my understanding that when running a job, the mapreduce ApplicationMaster will try to create as many containers of 512 Mb and 1 vCore on both slaves, which have only 2048 Mb and 4 vCores available each, which gives space for 4 containers on each slave. This is precisely what is happening on my jobs, so no problem so far.

但是,当我将 mapreduce.map.cpu.vcoresmapreduce.reduce.cpu.vcores 从 1 增加到 2 时,理论上应该只有足够的 vCores可用于为每个从站创建 2 个容器吗?但是不,我仍然有每个从站 4 个容器.

However, when i increment the mapreduce.map.cpu.vcores and mapreduce.reduce.cpu.vcores from 1 to 2, there should theoretically be only enough vCores available for creating 2 containers per slave right ? But no, I still have 4 containers per slave.

然后我尝试将 mapreduce.map.memory.mbmapreduce.reduce.memory.mb 从 512 增加到 768.这为 2 个容器(2048/768=2).

I then tried to increase the mapreduce.map.memory.mb and mapreduce.reduce.memory.mb from 512 to 768. This leaves space for 2 containers (2048/768=2).

映射器和减速器的 vCore 设置为 1 还是 2 无关紧要,这将始终为每个从站生成 2 个 768mb 的容器和 4 个 512mb 的容器.那么 vCore 有什么用?ApplicationMaster 似乎不在乎.

此外,当将内存设置为 768 并将 vCores 设置为 2 时,我会在映射器容器的节点管理器 UI 上显示此信息:

Also, when setting the memory to 768 and vCores to 2, I have this info displayed on nodemanager UI for a mapper container :

768 Mb 变成了 1024 TotalMemoryNeeded,2 个 vCore 被忽略并显示为 1 TotalVCoresNeeded.

The 768 Mb has turned into 1024 TotalMemoryNeeded, and the 2 vCores are ignored and displayed as 1 TotalVCoresNeeded.

因此,将它是如何工作的"问题分解为多个问题:

So to break down the "how does it work" question into multiple questions :

  1. 是否仅使用内存(并且忽略 vCore)来计算容器数量?
  2. mapreduce.map.memory.mb 值是否只是用于计算容器数量的完全抽象值(这就是为什么它可以四舍五入到 2 的下一个幂)?还是它以某种方式代表真实的内存分配?
  3. 为什么我们在 mapreduce.map.java.opts 中指定一些 -Xmx 值?为什么 yarn 不使用 mapreduce.map.memory.mb 的值来为容器分配内存?
  4. 什么是 TotalVCoresNeeded,为什么它总是等于 1?我尝试在所有节点(主节点和从节点)中更改 mapreduce.map.cpu.vcores 但它从未改变.
  1. Is only memory used (and vCores ignored) to calculate the number of containers ?
  2. Is the mapreduce.map.memory.mb value only a completely abstract value for calculating the number of containers (and that's why it can be rounded up to the next power of 2) ? Or does it represent real memory allocation in some way ?
  3. Why do we specify some -Xmx value in mapreduce.map.java.opts ? Why doesn't yarn use the value from mapreduce.map.memory.mb to allocate memory to the container ?
  4. What is TotalVCoresNeeded and why is it always equal to 1 ? I tried to change mapreduce.map.cpu.vcores in all nodes (master and slaves) but it never changes.

推荐答案

我会回答这个问题,假设使用的调度器是 CapacityScheduler.

I will answer this question, on the assumption that the scheduler used is, CapacityScheduler.

CapacityScheduler 使用 ResourceCalculator 计算应用程序所需的资源.资源计算器有两种类型:

CapacityScheduler uses ResourceCalculator for calculating the resources needed for an application. There are 2 types of resource calculators:

  1. DefaultResourceCalculator:仅考虑用于进行资源计算的内存(即用于计算容器数量)
  2. DominantResourceCalculator:将内存和 CPU 都考虑到资源计算中
  1. DefaultResourceCalculator: Takes into account, only memory for doing the resource calculations (i.e. for calculating number of containers)
  2. DominantResourceCalculator: Takes into account, both memory and CPU for resource calculations

默认情况下,CapacityScheduler 使用 DefaultResourceCalculator.如果要使用DominantResourceCalculator,则需要在capacity-scheduler.xml"文件中设置以下属性:

By default, the CapacityScheduler uses DefaultResourceCalculator. If you want to use the DominantResourceCalculator, then you need to set following property in "capacity-scheduler.xml" file:

  <property>
    <name>yarn.scheduler.capacity.resource-calculator</name>
    <value>org.apache.hadoop.yarn.util.resource.DominantResourceCalculator</value>
  </property>

现在,回答您的问题:

  1. 如果使用DominantResourceCalculator,那么在计算容器数量时会同时考虑内存和VCores

  1. If DominantResourceCalculator is used, then both memory and VCores are taken into account for calculating the number of containers

ma​​preduce.map.memory.mb 不是抽象值.在计算资源时将其考虑在内.

mapreduce.map.memory.mb is not an abstract value. It is taken into consideration while calculating the resources.

DominantResourceCalculator 类有一个 normalize() 函数,它使用 minimumResouce(由配置 yarn.scheduler.minimum-allocation-mb 确定)规范化资源请求,最大资源(由配置 yarn.scheduler.maximum-allocation-mb 确定)和步长因子(由配置 yarn.scheduler.minimum-allocation-mb 确定).

The DominantResourceCalculator class has a normalize() function, which normalizes the resource request, using minimumResouce (determined by config yarn.scheduler.minimum-allocation-mb), maximumresource (determined by config yarn.scheduler.maximum-allocation-mb) and a step factor (determined by config yarn.scheduler.minimum-allocation-mb).

规范化内存的代码如下所示(检查 org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.java):

The code for normalizing memory looks like below (Check org.apache.hadoop.yarn.util.resource.DominantResourceCalculator.java):

int normalizedMemory = Math.min(roundUp(
Math.max(r.getMemory(), minimumResource.getMemory()),
stepFactor.getMemory()),maximumResource.getMemory());

地点:

r = 请求的内存

逻辑如下:

一个.取max of(requested resource and minimum resource) = max(768, 512) = 768

a. Take max of(requested resource and minimum resource) = max(768, 512) = 768

b.roundup(768, StepFactor) = roundUp (768, 512) == 1279(大约)

b. roundup(768, StepFactor) = roundUp (768, 512) == 1279 (Approximately)

Roundup does : ((768 + (512 -1)) / 512) * 512 

c.min(roundup(512, stepFactor), maximumresource) = min(1279, 1024) = 1024

c. min(roundup(512, stepFactor), maximumresource) = min(1279, 1024) = 1024

所以最后,分配的内存是 1024 MB,这就是你得到的.

So finally, the allotted memory is 1024 MB, which is what you are getting.

为简单起见,您可以说汇总,以 512 MB 的步长增加需求(这是最小资源)

For the sake of simplicity, you can say that roundup, increments the demand in the steps of 512 MB (which is a minimumresource)

  1. 由于 Mapper 是一个 java 进程,ma​​preduce.map.java.opts 用于指定映射器的堆大小.
  1. Since Mapper is a java process, mapreduce.map.java.opts is used for specifying the heap size for the mapper.

其中 ma​​preduce.map.memory.mb 是容器使用的总内存.

Where as mapreduce.map.memory.mb is total memory used by the container.

ma​​preduce.map.java.opts 的值应小于 ma​​preduce.map.memory.mb

这里的答案解释说:Apache Hadoop YARN 中的 'mapreduce.map.memory.mb' 和 'mapred.map.child.java.opts' 有什么关系?

  1. 当您使用 DominantResourceCalculator 时,它使用 normalize() 函数来计算所需的 vCores.

  1. When you use DominantResourceCalculator, it uses normalize() function to calculate vCores needed.

代码是(类似于内存的规范化):

The code for that is (similar to normalization of memory):

  int normalizedCores = Math.min(roundUp  
`   Math.max(r.getVirtualCores(), minimumResource.getVirtualCores()), 
    stepFactor.getVirtualCores()), maximumResource.getVirtualCores());

这篇关于MapReduce2中如何基于vcores和memory创建容器?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆