在OOZIE-4.1.0中运行多个工作流时出错 [英] Error on running multiple Workflow in OOZIE-4.1.0

查看:352
本文介绍了在OOZIE-4.1.0中运行多个工作流时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我通过执行
http://gauravkohli.com/2014/08/26/apache-oozie-installation-on-hadoop-2-4- 1 /

  hadoop版本 -  2.6.0 
maven - 3.0.4
猪 - 0.12.0

群集设置 -

MASTER NODE runnig - Namenode,Resourcemanager,proxyserver。



运行SLAVE NODE -Datanode,Nodemanager。



工作意味着成功。
但是当我尝试运行多个Workflow作业时,即两个作业都处于接受状态

检查错误日志,我将这个问题深入挖掘,

  014-12-24 21:00:36,758 [JobControl] INFO org.apache.hadoop.ipc.Client  - 重试连接服务器:172.16。***。*** / 172.16。* ** ***:8032。已经尝试了9次(s);重试策略是RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1000 MILLISECONDS)
2014-12-25 09:30:39,145 [通信线程] INFO org.apache.hadoop.ipc.Client - 重试连接到服务器:172.16。 *** *** / 172.16 *** ***:。52406。已经尝试了9次(s);重试策略是RetryUpToMaximumCountWithFixedSleep(maxRetries = 10,sleepTime = 1000 MILLISECONDS)
2014-12-25 09:30:39,199 [通信线程] INFO org.apache.hadoop.mapred.Task - 通信异常:java.io. IOException:对本地异常失败:java.net.SocketException:网络无法访问:无更多信息;主机详细信息:本地主机是:SystemName / 127.0.0.1;目的地主机是:172.16。***。***:52406;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1415)
在org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.WritableRpcEngine $ Invoker.invoke(WritableRpcEngine.java:231)
at $ Proxy9.ping(未知源)
在org.apache.hadoop.mapred.Task $ TaskReporter.run(Task.java:742)
在java.lang.Thread.run(Thread.java:722 )
导致:java.net.SocketException:网络无法访问:无更多信息
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)$ b $ sun.nio.ch. SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect( NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client $ Connection.setupConnection(Client。 java:606)
在org.apac在org.apache.hadoop.ipc.Client $ Connection.access $ 2800(Client.java:367)
org.apache.hadoop.ipc.Client $ Connection.setupIOstreams(Client.java:700)
.apache.hadoop.ipc.Client.getConnection(Client.java:1463)
at org.apache.hadoop.ipc.Client.call(Client.java:1382)
... 5 more

心跳
心跳


在上面的运行作业中,如果我手动杀死任何一个启动器作业( hadoop job -kill< launcher-job-id>)表示所有作业都成功。所以我认为问题在于多个启动器同时运行,意味着工作会遇到死锁。如果有人知道原因和解决方案,以上问题。请尽快为我服务。

解决方案

问题出在Queue上,当我们在 SAME QUEUE(DEFAULT)在上面的集群设置中,Resourcemanager负责在salve节点中运行mapreduce作业。由于从节点资源不足,队列中运行的作业将遇到 Deadlock 情况。为了解决这个问题,我们需要通过在不同队列中触发mapreduce作业来分割Mapreduce作业。
$ b

你可以通过在oozie中的pig action中设置这个部分 workflow.xml

 < configuration> 
<属性>
< name> mapreduce.job.queuename< / name>
<值> launcher2< /值>
< / property>



注意:解决方案仅适用于小型群集设置


I installed oozie 4.1.0 on a Linux machine by following the steps at http://gauravkohli.com/2014/08/26/apache-oozie-installation-on-hadoop-2-4-1/

hadoop version - 2.6.0 
maven - 3.0.4 
pig - 0.12.0

Cluster Setup -

MASTER NODE runnig - Namenode, Resourcemanager ,proxyserver.

SLAVE NODE running -Datanode,Nodemanager.

When I run single workflow job means it succeeds. But when I try to run more than one Workflow job i.e. both the jobs are in accepted state

Inspecting the error log, I drill down the problem as,

014-12-24 21:00:36,758 [JobControl] INFO  org.apache.hadoop.ipc.Client  - Retrying connect to server: 172.16.***.***/172.16.***.***:8032. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-12-25 09:30:39,145 [communication thread] INFO  org.apache.hadoop.ipc.Client  - Retrying connect to server: 172.16.***.***/172.16.***.***:52406. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2014-12-25 09:30:39,199 [communication thread] INFO  org.apache.hadoop.mapred.Task  - Communication exception: java.io.IOException: Failed on local exception: java.net.SocketException: Network is unreachable: no further information; Host Details : local host is: "SystemName/127.0.0.1"; destination host is: "172.16.***.***":52406; 
 at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
 at org.apache.hadoop.ipc.Client.call(Client.java:1415)
 at org.apache.hadoop.ipc.Client.call(Client.java:1364)
 at org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:231)
 at $Proxy9.ping(Unknown Source)
 at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:742)
 at java.lang.Thread.run(Thread.java:722)
Caused by: java.net.SocketException: Network is unreachable: no further information
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:701)
 at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
 at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
 at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606)
 at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700)
 at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
 at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463)
 at org.apache.hadoop.ipc.Client.call(Client.java:1382)
 ... 5 more

Heart beat
Heart beat
.
.

In the above running jobs, if I kill any one launcher job manually (hadoop job -kill <launcher-job-id>) mean all jobs get succeeded. So I think the problem is more than one launcher job running simultaneously mean job will meet deadlock..

If anyone know the reason and solution for above problem. Please do me the favor as soon as possible.

解决方案

The problem is with the Queue, When we running the Job in SAME QUEUE(DEFAULT) with above cluster setup the Resourcemanager is responsible to run mapreduce job in the salve node. Due to lack of resource in slave node the job running in the queue will meet Deadlock situation.

In order to over come this issue we need to split the Mapreduce job by means of Triggering the mapreduce job in different queue.

you can do this by setting this part in the pig action inside your oozie workflow.xml

<configuration>
<property>
  <name>mapreduce.job.queuename</name>
  <value>launcher2</value>
</property>

NOTE: This solution only for SMALL CLUSTER SETUP

这篇关于在OOZIE-4.1.0中运行多个工作流时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆