纱线容器的理解和调整 [英] Yarn container understanding and tuning

查看:121
本文介绍了纱线容器的理解和调整的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好,我们最近升级到mr1的纱线。我知道容器是一个抽象的概念,但我不明白一个容器可以产生多少jvm任务(map,reduce,filter等),或者其他方式要求的是容器可以在多个map或reduce任务中重用。我在以下博客中阅读:



我在8台主机上运行这个vCores(该值由配置参数决定: yarn.nodemanager.resource.cpu-vcores )。默认情况下,它被设置为8.请检查YarnConfiguration.java。

  / **虚拟CPU核心数量可以被分配给容器。* / 
public static final String NM_VCORES = NM_PREFIX +resource.cpu-vcores;
public static final int DEFAULT_NM_VCORES = 8;

由于有10个映射器和1个应用程序主控器,所产生的容器总数为11个。



因此,对于每个map / reduce任务,都会启动一个不同的容器。



但是,在Yarn中,对于MapReduce作业,有一个Uber作业的概念,它允许用户为多个映射器和一个Reducer使用一个容器( https://hadoop.apache.org/docs/r2.4.1/hadoop -yarn / hadoop-yarn-common / yarn-default.xml :目前该代码无法支持多于一个的REDUCE,并会忽略较大的值。)。


  1. 没有配置参数可用于指定容器的最小数量。应用主管负责请求所需容器的数量。 yarn.scheduler.minimum-allocation-mb - 确定每个容器( yarn)的最小内存分配。 scheduler.maximum-allocation-mb 确定每个容器请求的最大分配)

    yarn.scheduler.minimum-allocation-vcores - 确定每个容器的vCore最小分配( yarn.scheduler.maximum-allocation-vcores 确定每个容器请求的最大分配)

    <在你的情况下,你要求 mapreduce.map.memory.mb = 3m (3MB)和 mapreduce.map.cpu.vcores = 4 (4个vCores) 。

    因此,您将为每个映射器获得1个包含4个vCore的容器(假设 yarn.scheduler.maximum-allocation-vcores >> = 4)

  2. 参数mapreduce.map.memory.mbmapreduce.map.cpu.vcores 在mapred-site.xml中设置文件。如果此配置参数不是最终,那么可以在提交作业之前在客户端中覆盖它。 是的。从应用程序的应用程序尝试页面,您可以看到分配的容器数量。检查上面的附图。


Hi we have recently upgraded to yarn from mr1. I know that container is an abstract notion but I don't understand how many jvm task (map, reduce, filter etc) one container can spawn or other way to ask is is container reusable across mutltiple map or reduce tasks. I read in following blog : What is a container in YARN?

"each mapper and reducer runs on its own container to be accurate!" which means if I look at AM logs I should see number of container allocated equal to number of map tasks (failed|success) plus number of reduce task is that correct?

I know number of containers changes during Application life cycle, based on AM requests, splits, scheduler etc.

But is there a way to request initial number of minimum container for given application. I think one way is to configure fair-scheduler queue. But is there anything else that can dictate this?

In case of MR if I have mapreduce.map.memory.mb = 3gb and mapreduce.map.cpu.vcores=4. I also have yarn.scheduler.minimum-allocation-mb = 1024m and yarn.scheduler.minimum-allocation-vcores = 1.

Does that mean I will get one container with 4 cores or 4 containers with one core?

Also its not clear where can you specify mapreduce.map.memory.mb and mapreduce.map.cpu.vcores. Should they be set in client node or can they be set per application as well?

Also from RM UI or AM UI is there a way to see currently assigned containers for given application?

解决方案

  1. Container is a logical entity. It grants an application to use specific amount of resources (memory, CPU etc.) on a specific host (Node Manager). A container can not be re-used across map and reduce tasks for the same application.

For e.g. I have a Mapreduce application, which spawns 10 mappers:

I am running this on a single host with 8 vCores (this value is determined by the configuration parameter: yarn.nodemanager.resource.cpu-vcores). By default, this is set to 8. Please check "YarnConfiguration.java"

  /** Number of Virtual CPU Cores which can be allocated for containers.*/
  public static final String NM_VCORES = NM_PREFIX + "resource.cpu-vcores";
  public static final int DEFAULT_NM_VCORES = 8;

Since there are 10 mappers and 1 Application master, total number of containers spawned is 11.

So, for each map/reduce task a different container gets launched.

But, in Yarn, for MapReduce jobs, there is a concept of a Uber job, which enables the user to use a single container for multiple mappers and 1 reducer (https://hadoop.apache.org/docs/r2.4.1/hadoop-yarn/hadoop-yarn-common/yarn-default.xml: CURRENTLY THE CODE CANNOT SUPPORT MORE THAN ONE REDUCE and will ignore larger values.).

  1. There is no configuration parameter available to specify the minimum number of the containers. It is the responsibility of the Application Master to request the number of containers needed.

  2. yarn.scheduler.minimum-allocation-mb - Determines the minimum allocation of memory for each container (yarn.scheduler.maximum-allocation-mb determines the maximum allocation for every container request)

    yarn.scheduler.minimum-allocation-vcores - Determines the minumum allocation of vCores for each container (yarn.scheduler.maximum-allocation-vcores determines the maximum allocation for every container request)

    In your case, you are requesting "mapreduce.map.memory.mb = 3m (3MB) and mapreduce.map.cpu.vcores = 4 (4 vCores).

    So, you will get 1 container with 4 vCores for each mapper (assuming yarn.scheduler.maximum-allocation-vcores is >= 4)

  3. The parameters "mapreduce.map.memory.mb" and "mapreduce.map.cpu.vcores" are set in the mapred-site.xml file. If this configuration parameter is not "final", then it can be overridden in the client, before submitting the job.

  4. Yes. From the "Application Attempt" page for the application, you can see the number of allocated containers. Check the attached figure above.

这篇关于纱线容器的理解和调整的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆