Hadoop,MapReduce:如何向mapReduce添加第二个节点? [英] Hadoop, MapReduce: how to add second node to mapReduce?

查看:342
本文介绍了Hadoop,MapReduce:如何向mapReduce添加第二个节点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个2个节点的Hadoop 0.2.2集群。在第一台机器上,我开始:




  • namenode

  • datanode

  • NodeManager

  • ResourceManager

  • JobHistoryServer



在第二个,我开始所有这些,除了namenode:




  • datanode

  • NodeManager

  • ResourceManager

  • JobHistoryServer



c $ c> mapred-site.xml 包含:

 < property& 
< name> mapred.job.tracker< / name>
< value> firstMachine:54311< / value>
< / property>

我在两台机器上的core-site.xml包含:

 < property> 
< name> fs.default.name< / name>
< value> hdfs:// firstMachine:9000< / value>
< / property>

控制台 http:// firstMachine:50070 报告2个节点:

 活节点:2(退役:0)

但是,控制台 http:// firstMachine:8088 历史和所有),不断说:

 活动节点:1 
pre>

此外,使用或不使用第二台机器执行映射reduce,Hadoop产生的性能几乎相同。使用4个大文件尝试它。



我的问题是:如何检查我的map reduce是否实际执行多个(在这种情况下为2)



如果我的Hadoop映射reduce实际上没有看到其他Hadoop实例,如何让它看到它我如何配置它运行map减少在2台机器上)?

解决方案

显然在2.2版本中,与mapred相关的东西最多(全部?)现在移动到了yarn。所以,不使用mapred-site.xml文件,我不得不使用yarn-site.xml文件,并添加:

 < property> 
< name> yarn.resourcemanager.hostname< / name>
< value> firstMachine< / value>
< / property>

(注意,我没有添加端口,



现在控制台显示2个活动节点,map / reduce作业的速度提高了20%。


I have a Hadoop 0.2.2 cluster of 2 nodes. On the first machine I start:

  • namenode
  • datanode
  • NodeManager
  • ResourceManager
  • JobHistoryServer

On the second I start all those as well, except for namenode:

  • datanode
  • NodeManager
  • ResourceManager
  • JobHistoryServer

My mapred-site.xml on both machines contains:

<property>
  <name>mapred.job.tracker</name>
  <value>firstMachine:54311</value>
</property>

My core-site.xml on both machines contains:

<property>
   <name>fs.default.name</name>
   <value>hdfs://firstMachine:9000</value>
</property>

The console at http://firstMachine:50070 reports 2 nodes:

 Live Nodes     :   2 (Decommissioned: 0)

However the console at http://firstMachine:8088 (the one with the map reduce jobs history and all that), keeps saying:

Active Nodes: 1

Also, executing a map reduce with or without the second machine, Hadoop yields pretty much the same performance. Tried it with the wordcout example, using 4 big files.

My question is: how can I check if my map reduce is actually executed on multiple (2 in this case) machines, and not just the one where it is launched?

If my Hadoop map reduce in fact does NOT see the other Hadoop instance, how to I make it see it (how can I configure it to run the map reduce on 2 machines) ?

解决方案

OK, I've found the answer. Apparently in version 2.2 most (all?) of the stuff that was related to mapred is now moved to yarn. So instead of using the mapred-site.xml file, I had to use the yarn-site.xml file, and add to it :

<property>
 <name>yarn.resourcemanager.hostname</name>
 <value>firstMachine</value>
</property>

(note that I didn't have to add the port, only the host is to be declared here. The port will be used in its default value).

Now the console displays 2 active nodes, and the map/reduce job is about 20% faster.

这篇关于Hadoop,MapReduce:如何向mapReduce添加第二个节点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆