使用 hadoop 指定内存限制 [英] Specifying memory limits with hadoop

查看：20 发布时间：2021/12/15 19:07:30 java hadoop

本文介绍了使用 hadoop 指定内存限制的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试在 Hadoop 集群 (0.20.203) 上运行高内存作业.我修改了 mapred-site.xml 以强制执行一些内存限制.

I am trying to run a high-memory job on a Hadoop cluster (0.20.203). I modified the mapred-site.xml to enforce some memory limits.

  <property>
    <name>mapred.cluster.max.map.memory.mb</name>
    <value>4096</value>
  </property>
  <property>
    <name>mapred.cluster.max.reduce.memory.mb</name>
    <value>4096</value>
  </property>
  <property>
    <name>mapred.cluster.map.memory.mb</name>
    <value>2048</value>
  </property>
  <property>
    <name>mapred.cluster.reduce.memory.mb</name>
    <value>2048</value>
  </property>

在我的工作中，我指定了我需要多少内存.不幸的是，即使我使用 -Xmx2g 运行我的进程(作为控制台应用程序使用这么多内存，该作业将运行得很好)我需要为我的映射器请求更多内存(作为一个子问题，为什么会这样?)或者它被杀死了.

In my job, I am specifying how much memory I will need. Unfortunately, even though I am running my process with -Xmx2g (the job will run just fine with this much memory as a console application) I need to request much more memory for my mapper (as a subquestion, why is this?) or it is killed.

val conf = new Configuration()
conf.set("mapred.child.java.opts", "-Xms256m -Xmx2g -XX:+UseSerialGC");
conf.set("mapred.job.map.memory.mb", "4096");
conf.set("mapred.job.reduce.memory.mb", "1024");

reducer 几乎不需要任何内存，因为我正在执行身份 reducer.

The reducer needs hardly any memory since I am performing an identity reducer.

  class IdentityReducer[K, V] extends Reducer[K, V, K, V] {
    override def reduce(key: K,
        values: java.lang.Iterable[V],
        context:Reducer[K,V,K,V]#Context) {
      for (v <- values) {
        context write (key, v)
      }
    }
  }

然而，reducer 仍然使用大量内存.是否可以为减速器提供与映射器不同的 JVM 参数?Hadoop 杀死了 reducer 并声称它使用了 3960 MB 的内存！而减速器最终无法完成这项工作.这怎么可能?

However, the reducer is still using a lot of memory. Is it possible to give the reducer different JVM arguments than the mapper? Hadoop kills the reducer and claims it is using 3960 MB of memory! And the reducers end up failing the job. How is this possible?

TaskTree [pid=10282,tipID=attempt_201111041418_0005_r_000000_0] is running beyond memory-limits.
Current usage : 4152717312bytes.
Limit : 1073741824bytes.
Killing task.

更新:即使我使用 cat 作为映射器和 uniq 作为减速器和 -Xms512M -Xmx1g -XX:+UseSerialGC 我的任务占用了 2g 的虚拟内存！这在最大堆大小的 4 倍时显得有些奢侈.

UPDATE: even when I specify a streaming job with cat as the mapper and uniq as the reducer and -Xms512M -Xmx1g -XX:+UseSerialGC my tasks take over 2g of virtual memory! This seems extravagant at 4x the max heap size.

TaskTree [pid=3101,tipID=attempt_201111041418_0112_m_000000_0] is running beyond memory-limits.
Current usage : 2186784768bytes.
Limit : 2147483648bytes.
Killing task.

更新:原始 JIRA 用于专门更改内存使用的配置格式提到 Java 用户最感兴趣的是物理内存以防止抖动.我认为这正是我想要的:如果可用的物理内存不足，我不希望节点启动映射器.然而，这些选项似乎都被实现为虚拟内存约束，难以管理.

Update: the original JIRA for changing the configuration format for memory usage specifically mentions that Java users are mostly interested in physical memory to prevent thrashing. I think this is exactly what I want: I don't want a node to spin up a mapper if there is inadequate physical memory available. However, these options all seem to have been implemented as virtual memory constraints, which are difficult to manage.

使用 hadoop 指定内存限制 [英] Specifying memory limits with hadoop

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

使用 hadoop 指定内存限制 [英] Specifying memory limits with hadoop

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭