纱线:如何利用完整的集群资源? [英] Yarn: How to utilize full cluster resources?

查看:103
本文介绍了纱线:如何利用完整的集群资源?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




  • 30GB内存

  • 4个vCPU


    以下是我在调整群集性能时发现的很重要的一些配置(来自Google)。我正在运行:


    • yarn.nodemanager.resource.cpu-vcores = > 4

    • yarn.nodemanager.resource.memory-mb => 17GB(为操作系统和其他进程保留)
    • >
    • mapreduce.map.memory.mb => 2GB
    • mapreduce。 reduce.memory.mb => 2GB

    • 运行 nproc => 4 )



    现在我担心的是,当我看着 ResourceManager 时,我看可用内存为 119 GB 这很好。但是当我运行一个沉重的 sqoop 作业,并且我的集群处于高峰时,它只使用〜59 GB 〜60 GB 内存未使用。



    我看到的一种方法可以解决这个未使用的内存问题, code> map | reduce.memory 改为4 GB,这样我们就可以使用每个节点最多16 GB的数据。其他方式是增加容器的数量,我不知道如何。


    • 4个核心x 7个节点= 28个可能的容器。 3正在被其他进程使用,目前只有5个可用于sqoop作业。



    在这种情况下,应该如何正确配置以提高群集性能。我可以增加容器的数量,比如说每个核心有两个容器。是否推荐?



    有关群集配置的任何帮助或建议将不胜感激。如果你的输入数据是26分割,YARN将创建26个映射器来并行处理这些分割。



    如果您有7个节点,其中包含26 GB的2 GB映射器,则重新分区应该是这样的:

  • Node1:4个映射器=> 8 GB
  • Node2:4个映射器=> 8 GB
  • Node3:4个映射器= > 8 GB

  • 节点4:4个映射器=> 8 GB
  • Node5:4个映射器=> 8 GB
  • Node6:3 mappers => 6 GB

  • Node7:3 mappers => 6 GB GB



因此,如果所有映射器同时运行,那么映射中使用的总内存减少工作将为26x2 = 52 GB。也许如果你通过reducer和ApplicationMaster容器添加内存用户,那么你可以在某些时候达到你的59GB,就像你说的那样。



如果这个是你目睹的行为,并且这些工作是在这26位映射者之后完成的,那么没有什么不对。您只需要大约60 GB就可以通过将任务分散到所有节点上来完成工作,而无需等待容器插槽释放自己。其他免费的60 GB只是在等待,因为你不需要它们。

编辑:


$ b使用所有内存来增加堆大小并不一定会提高性能。 $ b

但是,如果您仍然有很多映射器在等待排定,那么也许是因为您的安装配置为使用vcore来计算容器分配。这不是Apache Hadoop的默认设置,但可以配置:
ResourceCalculator实现用于比较调度程序中的资源。默认的,即org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator只使用内存,而DominantResourceCalculator使用Dominant-resource来比较多维资源,例如内存,CPU等。预计会有一个Java ResourceCalculator类名。


由于您将 yarn.nodemanager.resource.cpu-vcores 定义为4,并且由于每个mapper默认使用1个vcore,一次只能为每个节点运行4个映射器。

在这种情况下,您可以将的值加倍.nodemanager.resource.cpu-vcores 为8.它只是一个任意值,它应该使映射器的数量增加一倍。


So I am having a cloudera cluster with 7 worker nodes.

  • 30GB RAM
  • 4 vCPUs

Here are some of my configurations which I found important (from Google) in tuning performance of my cluster. I am running with:

  • yarn.nodemanager.resource.cpu-vcores => 4
  • yarn.nodemanager.resource.memory-mb => 17GB (Rest reserved for OS and other processes)
  • mapreduce.map.memory.mb => 2GB
  • mapreduce.reduce.memory.mb => 2GB
  • Running nproc => 4 (Number of processing units available)

Now my concern is, when I look at my ResourceManager, I see Available Memory as 119 GB which is fine. But when I run a heavy sqoop job and my cluster is at its peak it uses only ~59 GB of memory, leaving ~60 GB memory unused.

One way which I see, can fix this unused memory issue is increasing map|reduce.memory to 4 GB so that we can use upto 16 GB per node.

Other way is to increase the number of containers, which I am not sure how.

  • 4 cores x 7 nodes = 28 possible containers. 3 being used by other processes, only 5 are currently being available for sqoop job.

What should be the right config to improve cluster performance in this case. Can I increase the number of containers, say 2 containers per core. And is it recommended?

Any help or suggestions on the cluster configuration would be highly appreciated. Thanks.

解决方案

If your input data is in 26 splits, YARN will create 26 mappers to process those splits in parallel.

If you have 7 nodes with 2 GB mappers for 26 splits, the repartition should be something like:

  • Node1 : 4 mappers => 8 GB
  • Node2 : 4 mappers => 8 GB
  • Node3 : 4 mappers => 8 GB
  • Node4 : 4 mappers => 8 GB
  • Node5 : 4 mappers => 8 GB
  • Node6 : 3 mappers => 6 GB
  • Node7 : 3 mappers => 6 GB
  • Total : 26 mappers => 52 GB

So the total memory used in your map reduce job if all mappers are running at the same time will be 26x2=52 GB. Maybe if you add the memory user by the reducer(s) and the ApplicationMaster container, you can reach your 59 GB at some point, as you said ..

If this is the behaviour you are witnessing, and the job is finished after those 26 mappers, then there is nothing wrong. You only need around 60 GB to complete your job by spreading tasks across all your nodes without needing to wait for container slots to free themselves. The other free 60 GB are just waiting around, because you don't need them. Increasing heap size just to use all the memory won't necessarily improve performance.

Edited:

However, if you still have lots of mappers waiting to be scheduled, then maybe its because your installation insconfigured to calculate container allocation using vcores as well. This is not the default in Apache Hadoop but can be configured:

yarn.scheduler.capacity.resource-calculator : The ResourceCalculator implementation to be used to compare Resources in the scheduler. The default i.e. org.apache.hadoop.yarn.util.resource.DefaultResourseCalculator only uses Memory while DominantResourceCalculator uses Dominant-resource to compare multi-dimensional resources such as Memory, CPU etc. A Java ResourceCalculator class name is expected.

Since you defined yarn.nodemanager.resource.cpu-vcores to 4, and since each mapper uses 1 vcore by default, you can only run 4 mappers per node at a time.

In that case you can double your value of yarn.nodemanager.resource.cpu-vcores to 8. Its just an arbitrary value it should double the number of mappers.

这篇关于纱线:如何利用完整的集群资源?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆