继续运行hadoop分布式模式失败 [英] keep failing in running hadoop distributed mode

查看:137
本文介绍了继续运行hadoop分布式模式失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

很长时间以来我一直在困扰这个问题。
我尝试在distibuted节点中运行某些东西。
我有2个datanode和masternode和jobtracker。
我在每个节点的tasktracker.log中都收到以下错误:

 < 
2012-01-03 08:48:30,910 WARN mortbay.log - / mapOutput:org.apache.hadoop.util.DiskChecker $ DiskErrorException:找不到taskTracker / jobcache / job_201201031846_0001 / attempt_201201031846_0001_m_000000_1 / output / file.out .index在任何配置的本地目录中
2012-01-03 08:48:40,927警告mapred.TaskTracker - getMapOutput(attempt_201201031846_0001_m_000000_2,0)失败:
org.apache.hadoop.util.DiskChecker $ DiskErrorException:无法在任何配置的本地目录中找到taskTracker / jobcache / job_201201031846_0001 / attempt_201201031846_0001 / attempt_201201031846_0001_m_000000_2 / output / file.out.index
at org.apache.hadoop.fs.LocalDirAllocator $ AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java: 389)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
at org.apache.hadoop.mapred.TaskTracker $ MapOutputServlet.doGet(TaskTracker.java:2887)
位于javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay .jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet .SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle (WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java :152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection $ RequestHandler.headerComplete(HttpConnect
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay .thread.QueuedThreadPool $ PoolThread.run(QueuedThreadPool.java:522)
>

以及奴隶的hadoop.log中的这个错误:

  2012-01-03 10:20:36,732警告mapred.ReduceTask  -  attempt_201201031954_0006_r_000001_0将主机localhost添加到惩罚框中,下一次联系在4秒内
2012-01- 03 10:20:41,738警告mapred.ReduceTask - attempt_201201031954_0006_r_000001_0复制失败:attempt_201201031954_0006_m_000001_2 from localhost
2012-01-03 10:20:41,738警告mapred.ReduceTask - java.io.FileNotFoundException:http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000001_2& reduce = 1
at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection $ 6.run(HttpURLConnection.java:1491)
在java.securi ty.AccessController.doPrivileged(本地方法)
在sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
在sun.net.www.protocol.http.HttpURLConnection .getInputStream(HttpURLConnection.java:1139)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getInputStream(ReduceTask.java:1447)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getMapOutput(ReduceTask.java:1349)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop .mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.run(ReduceTask.java:1195)
导致:java.io.FileNotFoundException:http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000001_2& reduce = 1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
... 4 more

2012-01-03 10 :20:41,739 WARR mapred.ReduceTask - attempt_201201031954_0006_r_000001_0将主机本地主机添加到惩罚框中,在4秒内接触下一个
2012-01-03 10:20:46,761警告mapred.ReduceTask - attempt_201201031954_0006_r_000001_0复制失败:attempt_201201031954_0006_m_000000_3 from localhost
2012-01-03 10:20:46,762警告mapred.ReduceTask - java.io.FileNotFoundException:http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000000_3& reduce = 1
at sun .reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at sun.net.www.protocol.http.HttpURLConnection $ 6.run(HttpURLConnection.java:1491)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net .www.protocol.http.HttpURLConnection.getChainedExc eption(HttpURLConnection.java:1485)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getInputStream(ReduceTask.java:1447)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.getMapOutput(ReduceTask.java:1349)
at org.apache.hadoop.mapred。 ReduceTask $ ReduceCopier $ MapOutputCopier.copyOutput(ReduceTask.java:1261)
at org.apache.hadoop.mapred.ReduceTask $ ReduceCopier $ MapOutputCopier.run(ReduceTask.java:1195)
引起:java。 io.FileNotFoundException:http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000000_3& reduce = 1
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
... 4 more

这是我的配置:



mapred-site:

 <属性> 
<名称> mapred.job.tracker< / name>
< value> 10.20.1.112:9001< / value>
< description> MapReduce作业跟踪器运行
at的主机和端口。< / description>
< / property>

<属性>
<名称> mapred.map.tasks< / name>
<值> 2< /值>
< description>
将mapred.map任务定义为从属主机的数量
< / description>
< / property>

<属性>
<名称> mapred.reduce.tasks< / name>
<值> 2< /值>
< description>
将mapred.reduce任务定义为从属主机数量
< / property>

<属性>
<名称> mapred.system.dir< /名称>
<值> filesystem / mapreduce / system< / value>
< / property>

<属性>
<名称> mapred.local.dir< / name>
<值> filesystem / mapreduce / local< / value>
< / property>

<属性>
<名称> mapred.submit.replication< / name>
<值> 2< /值>
< / property>
<属性>
< name> hadoop.tmp.dir< / name>
<值> tmp< /值>
< / property>

<属性>
<名称> mapred.child.java.opts< / name>
<值> -Xmx2048m< /值>
< / property>

核心站点:

 <性> 
<名称> fs.default.name< /名称>
<值> hdfs://10.20.1.112:9000< /值>
< description>默认文件系统的名称。一个URI,其
模式和权限决定了FileSystem的实现。
< / description>
< / property>

我试过玩tmp dir - 没有帮助。
我尝试过使用mapred.local.dir - 没有帮助。



我也厌倦了在运行时查看文件系统目录中的内容。
我发现路径:taskTracker / jobcache / job_201201031846_0001 / attempt_201201031846_0001_m_000000_1 /
存在,但它没有输出文件夹。



任何想法?



谢谢。

解决方案

在这里,我想问题是:您的tasktracker想要询问master的映射输出,所以它应该是: p>

  http://10.20.1.112:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1 

但是在你的tasknode中,它试图从

  http:// localhost:50060 / mapOutput?job = job_201201031954_0006& map = attempt_201201031954_0006_m_000001_2& reduce = 1 

所以问题发生了,主要问题不是hadoop.tmp.dir,mapred.system.dir和mapred.local.dir,我也面临这个问题,我解决了通过删除master / etc / hosts中的127.0.0.1 localhost问题,也许你可以试试它!

编辑



总之,请转到导致错误的节点文件结构中的 etc / hosts 文件,并删除l ine 127.0.0.1 localhost


I'm stuck on this problem for a very long time. I try to run something in distibuted node. I have 2 datanodes and a master with namenode and jobtracker. I keep getting the following error in tasktracker.log of each of the nodes

<
2012-01-03 08:48:30,910 WARN  mortbay.log - /mapOutput: org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201201031846_0001/attempt_201201031846_0001_m_000000_1/output/file.out.index in any of the configured local directories
2012-01-03 08:48:40,927 WARN  mapred.TaskTracker - getMapOutput(attempt_201201031846_0001_m_000000_2,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201201031846_0001/attempt_201201031846_0001_m_000000_2/output/file.out.index in any of the configured local directories
    at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
    at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
    at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
    at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:324)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
    at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
    at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)
>

and this error in hadoop.log of the slave:

2012-01-03 10:20:36,732 WARN  mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 adding host localhost to penalty box, next contact in 4 seconds
2012-01-03 10:20:41,738 WARN  mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 copy failed: attempt_201201031954_0006_m_000001_2 from localhost
2012-01-03 10:20:41,738 WARN  mapred.ReduceTask - java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1
    at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
    ... 4 more

2012-01-03 10:20:41,739 WARN  mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 adding host localhost to penalty box, next contact in 4 seconds
2012-01-03 10:20:46,761 WARN  mapred.ReduceTask - attempt_201201031954_0006_r_000001_0 copy failed: attempt_201201031954_0006_m_000000_3 from localhost
2012-01-03 10:20:46,762 WARN  mapred.ReduceTask - java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000000_3&reduce=1
    at sun.reflect.GeneratedConstructorAccessor6.newInstance(Unknown Source)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
    at sun.net.www.protocol.http.HttpURLConnection$6.run(HttpURLConnection.java:1491)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1485)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1139)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
Caused by: java.io.FileNotFoundException: http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000000_3&reduce=1
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1434)
    ... 4 more

This is my configuration:

mapred-site:

    <property>
<name>mapred.job.tracker</name>
<value>10.20.1.112:9001</value>
<description>The host and port that the MapReduce job tracker runs
at.</description>
</property>

<property> 
  <name>mapred.map.tasks</name>
  <value>2</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description> 
</property> 

<property> 
  <name>mapred.reduce.tasks</name>
  <value>2</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description> 
</property> 

<property>
  <name>mapred.system.dir</name>
  <value>filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>filesystem/mapreduce/local</value>
</property>

<property>
  <name>mapred.submit.replication</name>
  <value>2</value>
</property>
<property>
    <name>hadoop.tmp.dir</name>
    <value>tmp</value>
</property>

<property>
    <name>mapred.child.java.opts</name>
    <value>-Xmx2048m</value>
</property>

core-site:

<property>
<name>fs.default.name</name>
<value>hdfs://10.20.1.112:9000</value>
<description>The name of the default file system. A URI whose
scheme and authority determine the FileSystem implementation.
</description>
</property>

I've tried playing with tmp dir - didnt help. I've tried playing with mapred.local.dir - didn't help.

I also tired to see what is in the filesystem dir during runtime. I found that the path : taskTracker/jobcache/job_201201031846_0001/attempt_201201031846_0001_m_000000_1/ exists, but it doesn't have output folder in it.

any idea?

thanks.

解决方案

Here I think the question is: Your tasktracker wants to ask the map output from master, so it should be:

http://10.20.1.112:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1

but in your tasknode, it tried to get it from

http://localhost:50060/mapOutput?job=job_201201031954_0006&map=attempt_201201031954_0006_m_000001_2&reduce=1  

so the problem occurs, and the main problem is not hadoop.tmp.dir, mapred.system.dir and mapred.local.dir, I'm facing this problem too, and I resolved the problem by deleting the "127.0.0.1 localhost" in /etc/hosts of master, maybe you can try it!

EDIT

In summary, go to the etc/hosts file in the file structure of the node that's causing the error and remove the line 127.0.0.1 localhost

这篇关于继续运行hadoop分布式模式失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆