如何在hadoop 2.x中并行运行MapReduce任务? [英] How to run MapReduce tasks in Parallel with hadoop 2.x?

查看：78 发布时间：2021/4/15 19:25:38 java hadoop mapreduce bigdata cloudera-cdh

本文介绍了如何在hadoop 2.x中并行运行MapReduce任务?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想要我的地图并减少任务以并行运行.但是，尽管尝试了各种技巧，但它们仍按顺序运行.我从如何在Elastic MapReduce上的Hadoop 2.4.0中设置每个节点的并发运行任务的最大精确数量，使用以下公式，可以设置并行运行的任务数量.

I would like my map and reduce tasks to run in parallel. However, despite trying every trick in the bag, they are still running sequentially. I read from How to set the precise max number of concurrently running tasks per node in Hadoop 2.4.0 on Elastic MapReduce, that using the following formula, one can set the number of tasks running in parallel.

min (yarn.nodemanager.resource.memory-mb / mapreduce.[map|reduce].memory.mb, 
 yarn.nodemanager.resource.cpu-vcores / mapreduce.[map|reduce].cpu.vcores)

但是，我做到了，正如您在下面使用的 yarn-site.xml 和 mapred-site.xml 所看到的那样.但是任务仍然按顺序运行.请注意，我使用的是开源Apache Hadoop，而不是Cloudera.迁移到Cloudera是否可以解决问题?另外请注意，我的输入文件足够大，以致 dfs.block.size 也不应该成为问题.

However, I did that, as you can see from the yarn-site.xml and mapred-site.xml I am using below. But the tasks still run sequentially. Note that I am using the open source Apache Hadoop and not Cloudera. Would shifting to Cloudera solve the problem? Also note that my input files are big enough that dfs.block.size should also not be an issue.

yarn-site.xml

    <configuration>
    <property>
      <name>yarn.nodemanager.resource.memory-mb</name>
      <value>131072</value>
    </property>
    <property>
      <name>yarn.nodemanager.resource.cpu-vcores</name>
      <value>64</value>
    </property>
    </configuration>

mapred-site.xml

    <configuration>
    <property>
      <name>mapred.job.tracker</name>
      <value>localhost:9001</value>
    </property>

    <property>
      <name>mapreduce.map.memory.mb</name>
      <value>16384</value>
    </property>

    <property>
      <name>mapreduce.reduce.memory.mb</name>
      <value>16384</value>
    </property>

    <property>
        <name>mapreduce.map.cpu.vcores</name>
        <value>8</value>
    </property>

    <property>
        <name>mapreduce.reduce.cpu.vcores</name>
        <value>8</value>
    </property>
    </configuration>

如何在hadoop 2.x中并行运行MapReduce任务? [英] How to run MapReduce tasks in Parallel with hadoop 2.x?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

如何在hadoop 2.x中并行运行MapReduce任务? [英] How to run MapReduce tasks in Parallel with hadoop 2.x?

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭