在Hadoop 2 + YARN中如何计算并发#mappers和#reducer? [英] How concurrent # mappers and # reducers are calculated in Hadoop 2 + YARN?

查看:167
本文介绍了在Hadoop 2 + YARN中如何计算并发#mappers和#reducer?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索了一段时间,我发现使用hadoop2 + yarn的MapReduce集群具有以下并发映射数量并减少了每个节点:

并发地图#= yarn.nodemanager.resource.memory-mb / mapreduce.map.memory.mb
并发减少#= yarn.nodemanager.resource.memory-mb / mapreduce.reduce.memory.mb



然而,我已经用10台机器建立了一个集群,并具有以下配置:

 'yarn_site'=> {
'yarn.nodemanager.resource.cpu-vcores'=> '32',
'yarn.nodemanager.resource.memory-mb'=> '16793',
'yarn.scheduler.minimum-allocation-mb'=> '532',
'yarn.nodemanager.vmem-pmem-ratio'=> '5',
'yarn.nodemanager.pmem-check-enabled'=> 'false'
},
'mapred_site'=> {
'mapreduce.map.memory.mb'=> '4669',
'mapreduce.reduce.memory.mb'=> '4915',
'mapreduce.map.java.opts'=> '-Xmx4669m',
'mapreduce.reduce.java.opts'=> '-Xmx4915m'
}

但是在群集设置完成后,hadoop允许6个容器为整个群集。我忘了什么?我做错了什么?

解决方案

不知道这是否与您遇到的问题相同,但我遇到类似问题,其中我在核心实例组中启动了一个包含20个c3.8xlarge节点的EMR集群,同样发现集群在运行作业时严重未充分利用(只有30个映射器在整个集群中同时运行即使在YARN和MapReduce中为我的特定集群配置的内存/ vcore显示可以运行超过500个并发容器)。我在AMI 3.5.0上使用Hadoop 2.4.0。



事实证明,实例组出于某种原因是重要的。当我用任务实例组中的20个节点重新启动集群并且只有1个核心节点时,这会产生巨大的差异。我获得了超过500多个mapper并发运行(在我的情况下,映射器大多是从S3下载文件,因此不需要HDFS)。



我不是确定为什么不同的实例组类型有所不同,因为两者都可以平等地运行任务,但显然他们被区别对待。



我想我会在这里提到它,因为我自己遇到了这个问题,并使用了不同的分组类型。


I've searched by sometime and I've found that a MapReduce cluster using hadoop2 + yarn has the following number of concurrent maps and reduces per node:

Concurrent Maps # = yarn.nodemanager.resource.memory-mb / mapreduce.map.memory.mb Concurrent Reduces # = yarn.nodemanager.resource.memory-mb / mapreduce.reduce.memory.mb

However, I've set up a cluster with 10 machines, with these configurations:

'yarn_site' => {
  'yarn.nodemanager.resource.cpu-vcores' => '32',
  'yarn.nodemanager.resource.memory-mb' => '16793',
  'yarn.scheduler.minimum-allocation-mb' => '532',
  'yarn.nodemanager.vmem-pmem-ratio' => '5',
  'yarn.nodemanager.pmem-check-enabled' => 'false'
},
'mapred_site' => {
  'mapreduce.map.memory.mb' => '4669',
  'mapreduce.reduce.memory.mb' => '4915',
  'mapreduce.map.java.opts' => '-Xmx4669m',
  'mapreduce.reduce.java.opts' => '-Xmx4915m'
}

But after the cluster is set up, hadoop allows 6 containers for the entire cluster. What am I forgetting? What am I doing wrong?

解决方案

Not sure if this is the same issue you're having, but I had a similar issue, where I launched an EMR cluster of 20 nodes of c3.8xlarge in the core instance group and similarly found the cluster to be severely underutilized when running a job (only 30 mappers were running concurrently across the entire cluster, even though the memory/vcore configs in YARN and MapReduce for my particular cluster show that over 500 concurrent containers can run). I was using Hadoop 2.4.0 on AMI 3.5.0.

It turns out that the instance group matters for some reason. When I relaunched the cluster with 20 nodes in task instance group and only 1 core node, that made a HUGE difference. I got over 500+ mappers running concurrently (in my case, the mappers were mostly downloading files from S3 and as such don't need HDFS).

I'm not sure why the different instance group type makes a difference, given that both can equally run tasks, but clearly they are being treated differently.

I thought I'd mention it here, given that I ran into this issue myself and using a different group type helped.

这篇关于在Hadoop 2 + YARN中如何计算并发#mappers和#reducer?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆