失败的地图任务数超过了允许的限制 [英] # of failed Map Tasks exceeded allowed limit

查看:128
本文介绍了失败的地图任务数超过了允许的限制的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用Python进行Hadoop流式传输。我从此处获取帮助,编写了简单的地图并减少了脚本



map 脚本如下:

 #!/ usr / bin / env python 

import sys,urllib,re

title_re = re.compile(< title> (。*?)< / title>,re.MULTILINE | re.DOTALL | re.IGNORECASE)

用于sys.stdin中的行:
url = line.strip()
match = title_re.search(urllib.urlopen(url).read())
如果匹配:
打印url,\ t,match.group(1).strip( )

reduce 脚本如下:


$ b $

 #!/ usr / bin / env python 
$ b $ from操作员导入itemgetter
导入sys

用于sys.stdin中的行:
line = line.strip()
print line

使用hadoop streaming jar运行这些脚本后, map tasks fini sh和我可以看到他们已经完成了100%,但是 reduce job被锁定在22%,并且经过很长一段时间后,它给出了 ERROR streaming。 StreamJob:作业不成功。错误:失败的地图任务数超过允许的限制。 FailedCount:1。 error。



我无法找出背后的确切原因。



我的终端窗口如下所示:

  shekhar @ ubuntu:/ host / Shekhar / Softwares / hadoop- 1.0.0 $ hadoop jar contrib / streaming / hadoop-streaming-1.0.0.jar -mapper /host/Shekhar/HadoopWorld/MultiFetch.py​​ -reducer /host/Shekhar/HadoopWorld/reducer.py -input / host / Shekhar / HadoopWorld / urls / * -output / host / Shekhar / HadoopWorld / titles3 
警告:$ HADOOP_HOME已弃用。

packageJobJar:[/ tmp / hadoop-shekhar / hadoop-unjar2709939812732871143 /] [] /tmp/streamjob1176812134999992997.jar tmpDir = null
12/05/27 11:27:46 INFO util .NativeCodeLoader:加载native-hadoop库
12/05/27 11:27:46信息mapred.FileInputFormat:要输入的总输入路径:3
12/05/27 11:27:46信息streaming.StreamJob:getLocalDirs():[/ tmp / hadoop-shekhar / mapred / local]
12/05/27 11:27:46信息streaming.StreamJob:正在运行的作业:job_201205271050_0006
12/05 / 27 11:27:46 INFO streaming.StreamJob:杀死这个工作,运行:
12/05/27 11:27:46 INFO streaming.StreamJob:/host/Shekhar/Softwares/hadoop-1.0.0 /libexec/../bin/hadoop作业-Dmapred.job.tracker = localhost:9001 -kill job_201205271050_0006
12/05/27 11:27:46 INFO streaming.StreamJob:跟踪URL:http:// localhost :50030 / jobdetails.jsp?jobid = job_201205271050_0006
12/05/27 11:27:47 INFO streaming.StreamJob:map 0%reduce 0%
12/05/27 11:28:07 INFO streaming.StreamJob:地图67%减少0 %
12/05/27 11:28:37信息streaming.StreamJob:地图100%减少0%
12/05/27 11:28:40信息streaming.StreamJob:地图100%减少11 %
12/05/27 11:28:49信息streaming.StreamJob:地图100%减少22%
12/05/27 11:31:35信息streaming.StreamJob:地图67%减少22 %
12/05/27 11:31:44信息streaming.StreamJob:地图100%减少22%
12/05/27 11:34:52信息streaming.StreamJob:地图67%减少22 %
12/05/27 11:35:01信息streaming.StreamJob:地图100%减少22%
12/05/27 11:38:11信息streaming.StreamJob:地图67%减少22 %
12/05/27 11:38:20信息streaming.StreamJob:地图100%减少22%
12/05/27 11:41:29信息streaming.StreamJob:地图67%减少22 %
12/05/27 11:41:35信息streaming.StreamJob:地图100%减少100%
12/05/27 11:41:35信息streaming.StreamJob:要杀死这份工作,运行:
12/05/27 11:41:35 INFO streaming.StreamJob:/host/Shekhar/Softwares/hadoop-1.0.0/libexec/../bin/hadoop job -Dmapred.job.tracker =本地主机:9001 -kill job_201205271050_0006
12/05/27 11:41:35 INFO streaming.StreamJob:跟踪网址:http:// localhost:50030 / jobdetails.jsp?jobid = job_201205271050_0006
12/05/27 11 :41:35错误streaming.StreamJob:作业不成功。错误:失败的地图任务数超过允许的限制。 FailedCount:1. LastFailedTask:task_201205271050_0006_m_000001
12/05/27 11:41:35 INFO streaming.StreamJob:killJob ...
Streaming Job Failed!

任何人都可以请我帮忙吗?

编辑
作业跟踪器的详细信息如下:

  Hadoop job_201205271050_0006 on localhost 

用户:shekhar
作业名称:streamjob1176812134999992997.jar
作业文件:file:/tmp/hadoop-shekhar/mapred/staging/shekhar/.staging/job_201205271050_0006/job.xml
提交主机:ubuntu
提交主机地址:127.0.1.1
作业ACL:允许所有用户
作业设置:成功
状态:失败
失败信息:失败的地图任务数超过允许的限制。 FailedCount:1. LastFailedTask:task_201205271050_0006_m_000001
起始于:Sun 5月27日11:27:46 IST 2012
失败日期:Sun 5月27日11:41:35 IST 2012
失败次数:13分钟, 48秒
工作清理:成功
黑名单TaskTracker:1
种类%完成数量任务等待正在运行完成杀死失败/杀死
任务尝试
地图100.00%
3 0 0 2 1 4/0
减少100.00%
1 0 0 0 1 0/1


解决方案

这个错误只是一个通用错误,太多的Map任务未能完成:


失败的映射任务超出允许的限制


您可以使用EMR控制台导航到日志个人地图/减少任务。
然后你应该能够看到问题是什么。



在我的情况中 - 当我犯了一些小错误时,例如设置路径地图脚本不正确。



查看任务日志的步骤:


I am trying my hands on Hadoop streaming using Python. I have written simple map and reduce scripts by taking help from here

map script is as follows :

#!/usr/bin/env python

import sys, urllib, re

title_re = re.compile("<title>(.*?)</title>", re.MULTILINE | re.DOTALL | re.IGNORECASE)

for line in sys.stdin:
    url = line.strip()
    match = title_re.search(urllib.urlopen(url).read())
    if match :
        print url, "\t", match.group(1).strip()

and reduce script is as follows :

#!/usr/bin/env python

from operator import itemgetter
import sys

for line in sys.stdin :
    line = line.strip()
    print line

After running these scripts using hadoop streaming jar, map tasks finish and I can see that they are 100% completed but reduce job get stuck at 22%, and after long period of time it gives ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. error.

I am not able to find out exact reason behind this.

My terminal window looks like as follows :

shekhar@ubuntu:/host/Shekhar/Softwares/hadoop-1.0.0$ hadoop jar contrib/streaming/hadoop-streaming-1.0.0.jar -mapper /host/Shekhar/HadoopWorld/MultiFetch.py -reducer /host/Shekhar/HadoopWorld/reducer.py -input /host/Shekhar/HadoopWorld/urls/* -output /host/Shekhar/HadoopWorld/titles3
Warning: $HADOOP_HOME is deprecated.

packageJobJar: [/tmp/hadoop-shekhar/hadoop-unjar2709939812732871143/] [] /tmp/streamjob1176812134999992997.jar tmpDir=null
12/05/27 11:27:46 INFO util.NativeCodeLoader: Loaded the native-hadoop library
12/05/27 11:27:46 INFO mapred.FileInputFormat: Total input paths to process : 3
12/05/27 11:27:46 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-shekhar/mapred/local]
12/05/27 11:27:46 INFO streaming.StreamJob: Running job: job_201205271050_0006
12/05/27 11:27:46 INFO streaming.StreamJob: To kill this job, run:
12/05/27 11:27:46 INFO streaming.StreamJob: /host/Shekhar/Softwares/hadoop-1.0.0/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201205271050_0006
12/05/27 11:27:46 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201205271050_0006
12/05/27 11:27:47 INFO streaming.StreamJob:  map 0%  reduce 0%
12/05/27 11:28:07 INFO streaming.StreamJob:  map 67%  reduce 0%
12/05/27 11:28:37 INFO streaming.StreamJob:  map 100%  reduce 0%
12/05/27 11:28:40 INFO streaming.StreamJob:  map 100%  reduce 11%
12/05/27 11:28:49 INFO streaming.StreamJob:  map 100%  reduce 22%
12/05/27 11:31:35 INFO streaming.StreamJob:  map 67%  reduce 22%
12/05/27 11:31:44 INFO streaming.StreamJob:  map 100%  reduce 22%
12/05/27 11:34:52 INFO streaming.StreamJob:  map 67%  reduce 22%
12/05/27 11:35:01 INFO streaming.StreamJob:  map 100%  reduce 22%
12/05/27 11:38:11 INFO streaming.StreamJob:  map 67%  reduce 22%
12/05/27 11:38:20 INFO streaming.StreamJob:  map 100%  reduce 22%
12/05/27 11:41:29 INFO streaming.StreamJob:  map 67%  reduce 22%
12/05/27 11:41:35 INFO streaming.StreamJob:  map 100%  reduce 100%
12/05/27 11:41:35 INFO streaming.StreamJob: To kill this job, run:
12/05/27 11:41:35 INFO streaming.StreamJob: /host/Shekhar/Softwares/hadoop-1.0.0/libexec/../bin/hadoop job  -Dmapred.job.tracker=localhost:9001 -kill job_201205271050_0006
12/05/27 11:41:35 INFO streaming.StreamJob: Tracking URL: http://localhost:50030/jobdetails.jsp?jobid=job_201205271050_0006
12/05/27 11:41:35 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201205271050_0006_m_000001
12/05/27 11:41:35 INFO streaming.StreamJob: killJob...
Streaming Job Failed!

Can anyone please help me??

EDIT job tracker details are as follows :

Hadoop job_201205271050_0006 on localhost

User: shekhar
Job Name: streamjob1176812134999992997.jar
Job File: file:/tmp/hadoop-shekhar/mapred/staging/shekhar/.staging/job_201205271050_0006/job.xml
Submit Host: ubuntu
Submit Host Address: 127.0.1.1
Job-ACLs: All users are allowed
Job Setup: Successful
Status: Failed
Failure Info:# of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201205271050_0006_m_000001
Started at: Sun May 27 11:27:46 IST 2012
Failed at: Sun May 27 11:41:35 IST 2012
Failed in: 13mins, 48sec
Job Cleanup: Successful
Black-listed TaskTrackers: 1
Kind    % Complete  Num Tasks   Pending Running Complete    Killed  Failed/Killed
Task Attempts
map 100.00%
3   0   0   2   1   4 / 0
reduce  100.00%
1   0   0   0   1   0 / 1

解决方案

this error is just a generic error, that too many Map tasks failed to complete:

of failed Map Tasks exceeded allowed limit

you can use the EMR Console to navigate to the logs for the individual Map / Reduce tasks. Then you should be able to see what the issue is.

In my case - I got this error when I made small mistakes, like setting the path to the Map script incorrectly.

steps to view the logs of the Tasks:

http://antipatterns.blogspot.nl/2013/03/amazon-emr-map-reduce-error-of-failed.html

这篇关于失败的地图任务数超过了允许的限制的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆