太多获取faliuers [英] Too many fetch faliuers

查看:87
本文介绍了太多获取faliuers的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Ubuntu 12.04和Hadoop 1.2.1上安装了一个2节点hadoop集群。
当我试图运行hadoop字数统计的例子时,我得到了太多获取失败错误。我引用了很多文章,但我无法弄清 Masters Slaves / etc / hosts 文件。
我的节点名称是 master ,其中 ip 10.0.0.1 和slaveone与 ip 10.0.0.2



我需要协助 master slave上的主站,从站和 / etc / hosts / code> node?

解决方案

如果无论出于何种原因无法升级集群,以下内容:


  1. 确保您的主机名绑定到网络IP, 127.0.0.1 $ c $ / etc / hosts

  2. 确保您仅使用主机名而不使用IP来引用服务。 >如果上述内容正确,请尝试以下设置:






  set mapred.reduce.slowstart.completed.maps = 0.80 
set tasktracker.http.threads = 80
set mapred.reduce.parallel.copies =(> = 10)(10应该足够了)

另外结帐这SO帖子:为什么我得到太多的抓取失败每隔一天



这一个:提取失败太多:群集上的Hadoop(x2)



此外如果上述操作无效: http:/ /grokbase.com/t/hadoop/common-user/098k7y5t4n/how-to-deal-with-too-many-fetch-failures
为了简洁和时间的利益,我正在把我发现这里是最贴切的。


这个数字1的原因是导致连接获得
图输出失败。我已经看到:
1)防火墙
2)配置错误的IP地址(即:尝试获取
的任务跟踪器在查找
的名称时收到了错误的IP地址具有地图段的tasktracker)
3)很少见,服务tasktracker上的http服务器由于
线程不足或者listen backlog而过载,如果
每次减少的次数为大而减少的数量或地图
的数量非常大。


可能还有其他一些情况,最近发生的情况是当我在10个节点集群上拥有6000
映射和20个reducer时,我认为这是case 3以上。
因为我实际上并不需要减少(我在地图阶段通过计数器
获得了我的汇总数据),所以我从未重新调整过群集。


编辑:原始答案表示确保您的主机名绑定到网络IP和127.0.0.1在 / etc / hosts

I have a setup, 2 node hadoop cluster on Ubuntu 12.04 and Hadoop 1.2.1. While I am trying to run hadoop word count example I am gettig "Too many fetch faliure error". I have referred many articles but I am unable to figure out what should be the entries in Masters,Slaves and /etc/hosts file. My nodes names are "master" with ip 10.0.0.1 and "slaveone" with ip 10.0.0.2.

I need assistance in what should be the entries in masters,slaves and /etc/hosts file in both master and slave node?

解决方案

If you're unable to upgrade the cluster for whatever reason, you can try the following:

  1. Ensure that your hostname is bound to the network IP and NOT 127.0.0.1 in /etc/hosts
  2. Ensure that you're using only hostnames and not IPs to reference services.
  3. If the above are correct, try the following settings:


set mapred.reduce.slowstart.completed.maps=0.80
set tasktracker.http.threads=80
set mapred.reduce.parallel.copies=(>= 10)(10 should probably be sufficient)

Also checkout this SO post: Why I am getting "Too many fetch-failures" every other day

And this one: Too many fetch failures: Hadoop on cluster (x2)

And also this if the above don't help: http://grokbase.com/t/hadoop/common-user/098k7y5t4n/how-to-deal-with-too-many-fetch-failures For brevity and in interest of time, I'm putting what I found to be the most pertinent here.

The number 1 cause of this is something that causes a connection to get a map output to fail. I have seen: 1) firewall 2) misconfigured ip addresses (ie: the task tracker attempting the fetch received an incorrect ip address when it looked up the name of the tasktracker with the map segment) 3) rare, the http server on the serving tasktracker is overloaded due to insufficient threads or listen backlog, this can happen if the number of fetches per reduce is large and the number of reduces or the number of maps is very large.

There are probably other cases, this recently happened to me when I had 6000 maps and 20 reducers on a 10 node cluster, which I believe was case 3 above. Since I didn't actually need to reduce ( I got my summary data via counters in the map phase) I never re-tuned the cluster.

EDIT: Original answer said "Ensure that your hostname is bound to the network IP and 127.0.0.1 in /etc/hosts"

这篇关于太多获取faliuers的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆