Hadoop错误拖延作业减少过程 [英] Hadoop error stalling job reduce process

查看:130
本文介绍了Hadoop错误拖延作业减少过程的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在我的双节点集群设置中,我一直在运行Hadoop作业(字数统计)几次,到现在为止工作情况良好。我一直收到一个RuntimeException,它将reduce进程拖延为19%:

  2013-04-13 18:45:22,191 INFO org .apache.hadoop.mapred.Task:完成任务:attempt_201304131843_0001_m_000000_0。并正在提交
2013-04-13 18:45:22,299信息org.apache.hadoop.mapred.Task:Task'attempt_201304131843_0001_m_000000_0'完成。
2013-04-13 18:45:22,318 INFO org.apache.hadoop.mapred.TaskLogsTruncater:使用mapRetainSize = -1和reduceRetainSize = -1 $ b $初始化日志的截断2013-04-13 18: 45:23,181 WARN org.apache.hadoop.mapred.Child:运行子
的错误java.lang.RuntimeException:运行命令获取文件权限时出错:org.apache.hadoop.util.Shell $ ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org .apache.hadoop.util.Shell $ ShellCommandExecutor.execute(Shell.java:375)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
at org.apache .hadoop.util.Shell.execCommand(Shell.java:444)
在org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:710)
在org.apache.hadoop.fs .RawLocalFileSystem $ RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:443)
在org.apache.hadoop.fs.RawLocalFileSystem $ RawLocalFileStatus.getOwner(RawLocalFileSy stem.java:426)
在org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
在org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java: org.apache.hadoop.mapred.Child
$ 4.run(Child.java:260)$ java.util.AccessController.doPrivileged(Native方法)
在javax.security中
.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child 。主要(Child.java:249)
$ b $在org.apache.hadoop.fs.RawLocalFileSystem $ RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:468)
在org.apache.hadoop.fs b .RawLocalFileSystem $ RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)
在org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
在org.apache.hadoop.mapred.TaskLogsTruncater .truncateLogs(TaskLogsTruncater.java:124)在org.apache.hadoop.mapred.Child处
$ 4.run(Child.java:260)$ b $在java.sec处urity.AccessController.doPrivileged(本地方法)在javax.security.auth.Subject.doAs(Subject.java:415)

在org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java :1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

有没有人有任何想法可能导致这种情况?



编辑:自己解决。
如果有人否则会遇到同样的问题,这是由主节点上的etc / hosts文件引起的。我没有输入从节点的主机名和地址。
这就是我的主机文件在主节点上的结构:

  127.0.0.1 MyUbuntuServer 
192.xxx.x.xx2 master
192.xxx.x.xx3 MySecondUbuntuServer
192.xxx.x.xx3 slave


解决方案

类似的问题在这里描述:
http://comments.gmane.org/gmane.comp.apache.mahout.user/8898



这里的信息可能与其他版本的hadoop有关。它说:
$ b


java.lang.RuntimeException:将命令运行到
时出错获取文件权限:java.io.IOException:无法运行程序
/ bin / ls:error = 12,没有足够的空间


通过 mapred.child.java.opts * * -Xmx1200M

https://groups.google.com/a/cloudera.org/forum/?fromgroups =#!topic / cdh-user / BHGYJDNKMGE



HTH,
Avner


I have been running a Hadoop job(word count example) a few times on my two-node cluster setup, and it´s been working fine up until now. I keep getting a RuntimeException which stalls the reduce process at 19%:

    2013-04-13 18:45:22,191 INFO org.apache.hadoop.mapred.Task: Task:attempt_201304131843_0001_m_000000_0 is done. And is in the process of commiting
    2013-04-13 18:45:22,299 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201304131843_0001_m_000000_0' done.
    2013-04-13 18:45:22,318 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
    2013-04-13 18:45:23,181 WARN org.apache.hadoop.mapred.Child: Error running child
    java.lang.RuntimeException: Error while running command to get file permissions : org.apache.hadoop.util.Shell$ExitCodeException: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:255)
at org.apache.hadoop.util.Shell.run(Shell.java:182)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:375)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:461)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:444)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:710)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:443)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)
at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

    at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:468)
at org.apache.hadoop.fs.RawLocalFileSystem$RawLocalFileStatus.getOwner(RawLocalFileSystem.java:426)
at org.apache.hadoop.mapred.TaskLog.obtainLogDirOwner(TaskLog.java:267)
at org.apache.hadoop.mapred.TaskLogsTruncater.truncateLogs(TaskLogsTruncater.java:124)
at org.apache.hadoop.mapred.Child$4.run(Child.java:260)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

Has anyone any ideas of what might be causing this?

Edit: Solved it myself. If anyone else runs into the same problem, this was caused by the etc/hosts file on the master-node. I hadn´t entered the host-name and address of the slave-node. This is how my hosts-file is structured on the master-node:

    127.0.0.1   MyUbuntuServer
    192.xxx.x.xx2   master
    192.xxx.x.xx3   MySecondUbuntuServer
    192.xxx.x.xx3   slave

解决方案

a similar problem is described here: http://comments.gmane.org/gmane.comp.apache.mahout.user/8898

The info there might be related to other version of hadoop. It says:

java.lang.RuntimeException: Error while running command to get file permissions : java.io.IOException: Cannot run program "/bin/ls": error=12, Not enough space

The solution their was to change heapsize through mapred.child.java.opts* *-Xmx1200M

See also: https://groups.google.com/a/cloudera.org/forum/?fromgroups=#!topic/cdh-user/BHGYJDNKMGE

HTH, Avner

这篇关于Hadoop错误拖延作业减少过程的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆