hadoop-streaming:reducer处于挂起状态,不启动? [英] hadoop-streaming: reducer in pending state, doesn't start?

查看:151
本文介绍了hadoop-streaming:reducer处于挂起状态,不启动?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个地图缩减作业,它运行良好,直到我开始看到一些失败的地图任务,例如

  attempt_201110302152_0003_m_000010_0 task_201110302152_0003_m_000010 worker1 FAILED 
任务attempt_201110302152_0003_m_000010_0未能报告602秒的状态。杀!
-------
任务attempt_201110302152_0003_m_000010_0未能报告607秒的状态。杀!
最后4KB
最后8KB
所有
attempt_201110302152_0003_m_000010_1 task_201110302152_0003_m_000010主失败
了java.lang.RuntimeException:java.io.IOException异常:泄漏的组织未能
。 apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
处org.apache.hadoop org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
。 streaming.PipeMapper.close(PipeMapper.java:132)
位于org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
位于org.apache.hadoop.streaming.PipeMapRunner。运行(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
at org.apache.hadoop.mapred.MapTask.run(MapTask。 java:372)
at org.apache.hadoop.mapred.Child $ 4.run(Child.java:261)$ b $ at java.security.AccessController.doPrivileged(Native Method)
at javax .security.auth.Subject.doAs(Subject.java:396)
在org.apache.hadoop.security.UserGroupInform ation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:255)
导致:java.io.IOException:溢出失败
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.collect(MapTask.java:1029)
at org.apache.hadoop.mapred.MapTask $ OldOutputCollector.collect(MapTask.java:592)
在org.apache.hadoop.streaming.PipeMapRed $ MROutputThread.run(PipeMapRed.java:381)
引起:org.apache.hadoop.util.DiskChecker $ DiskErrorException:无法找到任何有效的本地目录输出/ spill11.out
。在org.apache.hadoop.fs.LocalDirAllocator $ AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
。在org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java :146)在org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127

在org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
在org.apache。 hadoop.mapred.MapTask $ MapOutputBuffer.sortAndSpill(MapTask.java:1392)
at org.apache.hadoop.mapred.MapTask $ MapOutputBuffer.access $ 1800(MapTask.java:853)
at org.apache .hadoop.mapred.MapTask $ MapOutputBuffer $ SpillThread.run(MapTask.java:1344)
Last 4KB
Last 8KB
全部

,现在 reducer不会开始执行,而Reducer用于在地图任务运行时开始复制数据,我只能看到这是

  11/10/31 03:35:12 INFO streaming.StreamJob:map 95%reduce 0%
11/10/31 03:44:01 INFO streaming.StreamJob:map 96%reduce 0%
11/10/31 03:51:56 INFO streaming.StreamJob:map 97%reduce 0%
11/10/31 03:55:41 INFO streaming.StreamJob:map 98%reduce 0%
11/10/31 04:04:18 INFO streaming.StreamJob:map 99%reduce 0%
11/10/31 04:20:32 INFO streaming.StreamJob:map 100%reduce 0%

我是新手< hadoop 和 mapreduce ,并且不知道可能导致相同代码失败的原因,而这些代码早已成功运行



请帮助

谢谢

解决方案

查看 mapred.task.timeout 。如果你有大量的数据和很少的机器来处理它,你的任务可能会超时。您可以将此值设置为 0 ,从而禁用此超时。

或者,如果您可以调用 > context.progress 或一些等价的函数来说明发生了什么所以这项工作不会超时。


I have a map reduce job which was running fine until I started to see some failed map tasks like

attempt_201110302152_0003_m_000010_0    task_201110302152_0003_m_000010 worker1 FAILED  
Task attempt_201110302152_0003_m_000010_0 failed to report status for 602 seconds. Killing!
-------
Task attempt_201110302152_0003_m_000010_0 failed to report status for 607 seconds. Killing!
Last 4KB
Last 8KB
All
attempt_201110302152_0003_m_000010_1    task_201110302152_0003_m_000010 master  FAILED  
java.lang.RuntimeException: java.io.IOException: Spill failed
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:325)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:261)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
    at org.apache.hadoop.mapred.Child.main(Child.java:255)
Caused by: java.io.IOException: Spill failed
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1029)
    at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:592)
    at org.apache.hadoop.streaming.PipeMapRed$MROutputThread.run(PipeMapRed.java:381)
Caused by: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for output/spill11.out
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:127)
    at org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(MapOutputFile.java:121)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1392)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.access$1800(MapTask.java:853)
    at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run(MapTask.java:1344)
Last 4KB
Last 8KB
All

and now reducer doesn't start executing while earlier the reducer used to start copying the data even while map tasks are running, all I see is this

11/10/31 03:35:12 INFO streaming.StreamJob:  map 95%  reduce 0%
11/10/31 03:44:01 INFO streaming.StreamJob:  map 96%  reduce 0%
11/10/31 03:51:56 INFO streaming.StreamJob:  map 97%  reduce 0%
11/10/31 03:55:41 INFO streaming.StreamJob:  map 98%  reduce 0%
11/10/31 04:04:18 INFO streaming.StreamJob:  map 99%  reduce 0%
11/10/31 04:20:32 INFO streaming.StreamJob:  map 100%  reduce 0%

I am newbie to hadoop and mapreduce and doesn't really know what might be causing the same code to fail which was running successfully earlier

Please help

Thank you

解决方案

You should have a look at mapred.task.timeout. If you have a very large amount of data and few machines to process it, your task might be timing out. You can set this value to 0 which disables this timeout.

Alternatively, if you can call context.progress or some equivalent function to say that something is happening so that the job doesn't timeout.

这篇关于hadoop-streaming:reducer处于挂起状态,不启动?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆