Hadoop：基于簇大小的可用地图插槽数量 [英] Hadoop: number of available map slots based on cluster size

查看：102 发布时间：2018/5/31 20:18:47 hadoop mapreduce mapper

本文介绍了Hadoop：基于簇大小的可用地图插槽数量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

阅读由Hadoop生成的系统日志，我可以看到与此类似的行。

2013-05-06 16 ：32：45,118 INFO org.apache.hadoop.mapred.JobClient（main）：根据簇大小设置映射任务的默认数量为：84
有谁知道这个值是如何计算的？
我怎样才能在我的程序中获得这个值？
解决方案
我擦掉了Hadoop的源代码，找到字符串根据簇大小将映射任务的默认数量设置为（但我找到其他字符串，这些字符串在运行MR作业时正在打印）。此外，这个字符串不会被打印在本地安装的任何地方。谷歌搜索它列出了AWS与EMR的问题。
如您所确认的，您实际上使用Amazon Elastic MapReduce。我相信EMR对 JobClient 类的Hadoop，它会输出这个特定的行。

就计算这个数字而言，我会怀疑它是根据如下特征计算的：簇（N）中的（活动）节点总数和每个节点的地图槽数量（M），即 N * M 。但是，也可能会考虑其他AWS特定的资源（内存）限制。您必须在EMR相关论坛中查询确切的公式。

另外， JobClient 公开了一个有关集群的信息集。使用 JobClient＃ getClusterStatus（）可以访问以下信息：

群集大小

追踪者名称

列入黑名单/活动追踪者的数量

群集的任务容量

当前正在运行的地图数量&减少任务。通过

/apache/hadoop/mapred/ClusterStatus.htmlrel =nofollow> ClusterStatus 类对象，因此您可以尝试手动在程序中计算所需的数字。

Reading the syslog generated by Hadoop, I can see lines similar to this one..
2013-05-06 16:32:45,118 INFO org.apache.hadoop.mapred.JobClient (main): Setting default number of map tasks based on cluster size to : 84
Does anyone know how this value is computed? And how can I get this value in my program?
解决方案
I grepped the source code of Hadoop and did not find the string Setting default number of map tasks based on cluster size to at all (whereas I find other strings, which are being printed when running MR jobs). Furthermore this string is not being printed anywhere in my local installation. A google search for it listed problems on AWS with EMR. As you confirmed, your're in fact using Amazon Elastic MapReduce. I believe EMR has some own modifications to the JobClient class of Hadoop, which outputs this particular line.

As far as computing this number is concerned I would suspect it to be computed based on characteristics like total number of (active) nodes in cluster (N) and number of map slots per node (M), i.e. N*M. However, additional AWS-specific resource (memory) constraints may also be taken into account. You'd have to ask in EMR-related forums for the exact formula.

Additionaly, the JobClient exposes a set of information about the cluster. Using the method JobClient#getClusterStatus() you can access information like:

Size of the cluster.

Name of the trackers.

Number of blacklisted/active trackers.

Task capacity of the cluster.

The number of currently running map & reduce tasks.

via the ClusterStatus class object, so you can try and compute the desired number in your program manually.

这篇关于Hadoop：基于簇大小的可用地图插槽数量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

Hadoop：基于簇大小的可用地图插槽数量 [英] Hadoop: number of available map slots based on cluster size

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录关闭

Hadoop：基于簇大小的可用地图插槽数量 [英] Hadoop: number of available map slots based on cluster size

问题描述

相关文章

分布式计算/Hadoop最新文章

热门教程

热门工具

登录 关闭

登录关闭