提取失败太多:集群(x2)上的Hadoop [英] Too many fetch failures: Hadoop on cluster (x2)

查看:122
本文介绍了提取失败太多:集群(x2)上的Hadoop的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

上个星期左右,我一直在使用Hadoop(试图掌握它),尽管我已经能够建立一个多节点集群(2台计算机:1台笔记本电脑和一个小型台式机)并检索结果,当我运行hadoop作业时,似乎总是遇到获取失败太多"的问题.

I have been using Hadoop for the last week or so (trying to get to grips with it), and although I have been able to set up a multinode cluster (2 machines: 1 laptop and a small desktop) and retrieve results, I always seem to encounter "Too many fetch failures" when I run a hadoop job.

一个示例输出(在一个简单的单词计数示例上)是:

An example output (on a trivial wordcount example) is:

hadoop@ap200:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-0.20.203.0.jar wordcount sita sita-output3X
11/05/20 15:02:05 INFO input.FileInputFormat: Total input paths to process : 7
11/05/20 15:02:05 INFO mapred.JobClient: Running job: job_201105201500_0001
11/05/20 15:02:06 INFO mapred.JobClient:  map 0% reduce 0%
11/05/20 15:02:23 INFO mapred.JobClient:  map 28% reduce 0%
11/05/20 15:02:26 INFO mapred.JobClient:  map 42% reduce 0%
11/05/20 15:02:29 INFO mapred.JobClient:  map 57% reduce 0%
11/05/20 15:02:32 INFO mapred.JobClient:  map 100% reduce 0%
11/05/20 15:02:41 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:02:49 INFO mapred.JobClient: Task Id :      attempt_201105201500_0001_m_000003_0, Status : FAILED
Too many fetch-failures
11/05/20 15:02:53 INFO mapred.JobClient:  map 85% reduce 9%
11/05/20 15:02:57 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:03:10 INFO mapred.JobClient: Task Id : attempt_201105201500_0001_m_000002_0, Status : FAILED
Too many fetch-failures
11/05/20 15:03:14 INFO mapred.JobClient:  map 85% reduce 9%
11/05/20 15:03:17 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:03:25 INFO mapred.JobClient: Task Id : attempt_201105201500_0001_m_000006_0, Status : FAILED
Too many fetch-failures
11/05/20 15:03:29 INFO mapred.JobClient:  map 85% reduce 9%
11/05/20 15:03:32 INFO mapred.JobClient:  map 100% reduce 9%
11/05/20 15:03:35 INFO mapred.JobClient:  map 100% reduce 28%
11/05/20 15:03:41 INFO mapred.JobClient:  map 100% reduce 100%
11/05/20 15:03:46 INFO mapred.JobClient: Job complete: job_201105201500_0001
11/05/20 15:03:46 INFO mapred.JobClient: Counters: 25
11/05/20 15:03:46 INFO mapred.JobClient:   Job Counters 
11/05/20 15:03:46 INFO mapred.JobClient:     Launched reduce tasks=1
11/05/20 15:03:46 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=72909
11/05/20 15:03:46 INFO mapred.JobClient:     Total time spent by all reduces waiting  after reserving slots (ms)=0
11/05/20 15:03:46 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
11/05/20 15:03:46 INFO mapred.JobClient:     Launched map tasks=10
11/05/20 15:03:46 INFO mapred.JobClient:     Data-local map tasks=10
11/05/20 15:03:46 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=76116
11/05/20 15:03:46 INFO mapred.JobClient:   File Output Format Counters 
11/05/20 15:03:46 INFO mapred.JobClient:     Bytes Written=1412473
11/05/20 15:03:46 INFO mapred.JobClient:   FileSystemCounters
11/05/20 15:03:46 INFO mapred.JobClient:     FILE_BYTES_READ=4462381
11/05/20 15:03:46 INFO mapred.JobClient:     HDFS_BYTES_READ=6950740
11/05/20 15:03:46 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=7546513
11/05/20 15:03:46 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=1412473
11/05/20 15:03:46 INFO mapred.JobClient:   File Input Format Counters 
11/05/20 15:03:46 INFO mapred.JobClient:     Bytes Read=6949956
11/05/20 15:03:46 INFO mapred.JobClient:   Map-Reduce Framework
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce input groups=128510
11/05/20 15:03:46 INFO mapred.JobClient:     Map output materialized bytes=2914947
11/05/20 15:03:46 INFO mapred.JobClient:     Combine output records=201001
11/05/20 15:03:46 INFO mapred.JobClient:     Map input records=137146
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce shuffle bytes=2914947
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce output records=128510
11/05/20 15:03:46 INFO mapred.JobClient:     Spilled Records=507835
11/05/20 15:03:46 INFO mapred.JobClient:     Map output bytes=11435785
11/05/20 15:03:46 INFO mapred.JobClient:     Combine input records=1174986
11/05/20 15:03:46 INFO mapred.JobClient:     Map output records=1174986
11/05/20 15:03:46 INFO mapred.JobClient:     SPLIT_RAW_BYTES=784
11/05/20 15:03:46 INFO mapred.JobClient:     Reduce input records=201001

我在这个问题上做了一个google,apache的人似乎暗示这可能是网络问题(或与/etc/hosts文件有关),或者是从属节点上的磁盘损坏了.

I did a google on the problem, and the people at apache seem to suggest it could be anything from a networking problem (or something to do with /etc/hosts files) or could be a corrupt disk on the slave nodes.

只需添加:我确实在namenode管理员面板(localhost:50070/dfshealth)上看到了2个活动节点",在Map/reduce Admin下,我也看到了2个节点.

Just to add: I do see 2 "live nodes" on namenode Admin panel (localhost:50070/dfshealth) and under Map/reduce Admin, I see 2 nodes aswell.

关于如何避免这些错误的任何线索? 预先感谢.

Any clues as to how I can avoid these errors? Thanks in advance.

1:

tasktracker日志已打开: http://pastebin.com/XMkNBJTh 数据节点日志处于打开状态: http://pastebin.com/ttjR7AYZ

The tasktracker log is on: http://pastebin.com/XMkNBJTh The datanode log is on: http://pastebin.com/ttjR7AYZ

非常感谢.

推荐答案

修改datanode node/etc/hosts文件.

Modify datanode node/etc/hosts file.

每行分为三部分.第一部分是网络IP地址,第二部分是主机名或域名,第三部分是主机别名,详细步骤如下:

Each line is divided into three parts. The first part is the network IP address, the second part is the host name or domain name, the third part is the host alias detailed steps are as follows:

  1. 首先检查主机名:

  1. First check the host name:

cat/proc/sys/内核/主机名

cat / proc / sys / kernel / hostname

您将看到一个HOSTNAME属性.在确定"上更改后面的IP值,然后退出.

You will see a HOSTNAME attribute. Change the value of the IP behind on OK and then exit.

使用命令:

hostname ***. ***. ***. ***

星号被相应的IP替换.

Asterisk is replaced by the corresponding IP.

类似地修改主机配置,如下所示:

Modify the the hosts configuration similarly, as follows:

127.0.0.1 localhost.localdomain本地主机 :: 1 localhost6.localdomain6 localhost6 10.200.187.77 10.200.187.77 hadoop-datanode

127.0.0.1 localhost.localdomain localhost :: 1 localhost6.localdomain6 localhost6 10.200.187.77 10.200.187.77 hadoop-datanode

如果IP地址已配置并成功修改,或者显示主机名有问题,请继续修改主机文件.

If the IP address is configured and successfully modified, or show host name there is a problem, continue to modify the hosts file.

这篇关于提取失败太多:集群(x2)上的Hadoop的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆