hadoop-streaming:reducer处于挂起状态,不启动? [英] hadoop-streaming: reducer in pending state, doesn't start?
问题描述
我有一个地图缩减作业,它运行良好,直到我开始看到一些失败的地图任务,例如
attempt_201110302152_0003_m_000010_0 task_201110302152_0003_m_000010 worker1 FAILED
任务attempt_201110302152_0003_m_000010_0未能报告602秒的状态。杀!
-------
任务attempt_201110302152_0003_m_000010_0未能报告607秒的状态。杀!
最后4KB
最后8KB
所有
attempt_201110302152_0003_m_000010_1 task_201110302152_0003_m_000010主失败
了java.lang.RuntimeException:java.io.IOException异常:泄漏的组织未能
。 apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
处org.apache.hadoop org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
。 streaming.PipeMapper.close(PipeMapper.java:132)
位于org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
位于org.apache.hadoop.streaming.PipeMapRunner。运行(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask。 java:372)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java:261)$ b $ at java.security.AccessController.doPrivileged(Native Method)
at javax .security.auth.Subject.doAs(Subject.java:396)
在org.apache.hadoop.security.UserGroupInform ation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:255)
导致:java.io.IOException:溢出失败
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.collect(MapTask.java:1029)
at org.apache.hadoop.mapred.MapTask $ OldOutputCollector.collect(MapTask.java:592)
在org.apache.hadoop.streaming.PipeMapRed $ MROutputThread.run(PipeMapRed.java:381)
引起:org.apache.hadoop.util.DiskChecker $ DiskErrorException:无法找到任何有效的本地目录输出/ spill11.out
。在org.apache.hadoop.fs.LocalDirAllocator $ AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
。在org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java :146)在org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127
)
在org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
在org.apache。 hadoop.mapred.MapTask $ MapOutputBuffer.sortAndSpill(MapTask.java:1392)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.access $ 1800(MapTask.java:853)
at org.apache .hadoop.mapred.MapTask $ MapOutputBuffer $ SpillThread.run(MapTask.java:1344)
Last 4KB
Last 8KB
全部
,现在 reducer不会开始执行,而Reducer用于在地图任务运行时开始复制数据,我只能看到这是
11/10/31 03:35:12 INFO streaming.StreamJob:map 95%reduce 0%
11/10/31 03:44:01 INFO streaming.StreamJob:map 96%reduce 0%
11/10/31 03:51:56 INFO streaming.StreamJob:map 97%reduce 0%
11/10/31 03:55:41 INFO streaming.StreamJob:map 98%reduce 0%
11/10/31 04:04:18 INFO streaming.StreamJob:map 99%reduce 0%
11/10/31 04:20:32 INFO streaming.StreamJob:map 100%reduce 0%
我是新手< mapreduce
,并且不知道可能导致相同代码失败的原因,而这些代码早已成功运行
请帮助
谢谢
查看 mapred.task.timeout
。如果你有大量的数据和很少的机器来处理它,你的任务可能会超时。您可以将此值设置为 0
,从而禁用此超时。
或者,如果您可以调用 > context.progress
或一些等价的函数来说明发生了什么所以这项工作不会超时。
I have a map reduce job which was running fine until I started to see some failed map tasks like
attempt_201110302152_0003_m_000010_0 task_201110302152_0003_m_000010 worker1 FAILED
Task attempt_201110302152_0003_m_000010_0 failed to report status for 602 seconds. Killing!
-------
Task attempt_201110302152_0003_m_000010_0 failed to report status for 607 seconds. Killing!
Last 4KB
Last 8KB
All
attempt_201110302152_0003_m_000010_1 task_201110302152_0003_m_000010 master FAILED
java.lang.RuntimeException: java.io.IOException: Spill failed
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:255)
Caused by: java.io.IOException: Spill failed
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1029)
at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:381)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill11.out
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1392)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344)
Last 4KB
Last 8KB
All
and now reducer doesn't start executing while earlier the reducer used to start copying the data even while map tasks are running, all I see is this
11/10/31 03:35:12 INFO streaming.StreamJob: map 95% reduce 0%
11/10/31 03:44:01 INFO streaming.StreamJob: map 96% reduce 0%
11/10/31 03:51:56 INFO streaming.StreamJob: map 97% reduce 0%
11/10/31 03:55:41 INFO streaming.StreamJob: map 98% reduce 0%
11/10/31 04:04:18 INFO streaming.StreamJob: map 99% reduce 0%
11/10/31 04:20:32 INFO streaming.StreamJob: map 100% reduce 0%
I am newbie to hadoop
and mapreduce
and doesn't really know what might be causing the same code to fail which was running successfully earlier
Please help
Thank you
You should have a look at mapred.task.timeout
. If you have a very large amount of data and few machines to process it, your task might be timing out. You can set this value to 0
which disables this timeout.
Alternatively, if you can call context.progress
or some equivalent function to say that something is happening so that the job doesn't timeout.
这篇关于hadoop-streaming:reducer处于挂起状态,不启动?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!