Oozie工作流配置单元动作卡在RUNNING中 [英] Oozie workflow hive action stuck in RUNNING

查看:236
本文介绍了Oozie工作流配置单元动作卡在RUNNING中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我从Hortonworks发行版运行Hadoop 2.4.0,Oozie 4.0.0,Hive 0.13.0。



我有多个Oozie协调器作业可以启动工作流程大约在同一时间。协调器作业每个都监视不同的目录,当_SUCCESS文件显示在这些目录中时,工作流将启动。



工作流运行Hive操作,从外部目录读取并复制东西。

  SET hive.exec.dynamic.partition = true; 
SET hive.exec.dynamic.partition.mode = nonstrict;

DROP TABLE IF EXISTS $ {INPUT_TABLE};

CREATE external TABLE IF NOT EXISTS $ {INPUT_TABLE}(
id bigint,
数据字符串,
creationdate时间戳,
datelastupdated时间戳)
LOCATION'$ {INPUT_LOCATION}';

- 从外部表中读取并插入到分区Hive表中
FROM $ {INPUT_TABLE} ent
INSERT OVERWRITE TABLE mytable PARTITION(data)
SELECT ent。 id,ent.data,ent.creationdate,ent.datelastupdated;

当我只运行一个协调器来启动一个工作流程时,工作流程和配置单元操作无需任何问题。

当同时启动多个工作流程时,配置单元操作会长时间处于RUNNING状态。



如果我看一下系统日志,我会看到:

  2015-02-18 17:18:26,048信息[AsyncDispatcher事件处理程序] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl:task_1423085109915_0223_m_000000任务从SCHEDULED转换为RUNNING 
2015-02- 18 17:18:26,586 INFO [RMCommunator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor:getResources()for application_1423085109915_0223:ask = 3 release = 0 newContainers = 0 finishedContainers = 0 resourcelimit =< memory :32768,vCores:-3> knownNMs = 1
2015-02-18 17:18:27,677 INFO [Socket读卡器#1用于端口38704] SecurityLogger.org.apache.hadoop.ipc.Server:Auth成功作业_1423085109915_0223(auth:SIMPLE)
2015-02-18 17:18:27,696信息[38704上的IPC服务器处理程序0] org.apache.hadoop.mapred.TaskAttemptListenerImpl:具有以下标识的JVM:jvm_1423085109915_0223_m_000002请求任务
2015-02-18 17 :18:27,697信息[38704上的IPC服务器处理程序0] org.apache.hadoop.mapred.TaskAttemptListenerImpl:具有ID:jvm_1423085109915_0223_m_000002给出的任务的JVM:attempt_1423085109915_0223_m_000000_0
2015-02-18 17:18:34,951信息[IPC Server处理器2在38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl:TaskAttempt的进度attempt_1423085109915_0223_m_000000_0是:1.0
2015-02-18 17:19:05,060信息[38704上的IPC服务器处理程序11] org.apache.hadoop .mapred.TaskAttemptListenerImpl:TaskAttempt的进度attempt_1423085109915_0223_m_000000_0是:1.0
2015-02-18 17:19:35,161信息[IPC Server handler 28704上的28] org.apache.hadoop.mapred.TaskAttemptListenerImpl:TaskAttempt的进度attempt_1423085109915_0223_m_000000_0 is:1.0
2015-02-18 17:20:05,262信息[38704上的IPC服务器处理程序2] org.apache.hadoop。 TaskreditImpl:TaskAttempt的进度attempt_1423085109915_0223_m_000000_0 is:1.0
2015-02-18 17:20:35,358信息[38704上的IPC服务器处理程序11] org.apache.hadoop.mapred.TaskAttemptListenerImpl:TaskAttempt的进度attempt_1423085109915_0223_m_000000_0是: 1.0
2015-02-18 17:21:02,452信息[38704上的IPC服务器处理程序23] org.apache.hadoop.mapred.TaskAttemptListenerImpl:TaskAttempt的进度attempt_1423085109915_0223_m_000000_0是:1.0
2015-02-18 17:21:32,545信息[38704上的IPC服务器处理程序1] org.apache.hadoop.mapred.TaskAttemptListenerImpl:TaskAttempt的进度attempt_1423085109915_0223_m_000000_0是:1.0
2015-02-18 17:22:02,668 INFO [IPC服务器处理程序12在38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: TaskAttempt的进度attempt_1423085109915_0223_m_000000_0是:1.0

它只是反复打印TaskAttempt的进度。



我们的yarn-site.xml被配置为使用这个:

 < property> 
< name> yarn.resourcemanager.scheduler.class< / name>
< value> org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler< / value>
< / property>

我应该使用不同的调度程序吗?



此时,我不确定问题是出现在Oozie还是Hive中。 解决方案

原来这是与此处列出的HEART BEAT问题相同的问题:

在OOZIE-4.1.0中运行多个工作流程时出错

更改如上述文章中所述,我可以运行多个工作流程。


I am running Hadoop 2.4.0, Oozie 4.0.0, Hive 0.13.0 from Hortonworks distro.

I have multiple Oozie coordinator jobs that can potentially launch workflows all around the same time. The coordinator jobs each watch different directories and when the _SUCCESS files show up in those directories, the workflow would be launched.

The workflow runs a Hive action that reads from external directory and copy stuff.

SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;

DROP TABLE IF EXISTS ${INPUT_TABLE};

CREATE external TABLE IF NOT EXISTS ${INPUT_TABLE} (
       id bigint,
       data string,
       creationdate timestamp,
       datelastupdated timestamp)
LOCATION '${INPUT_LOCATION}';

-- Read from external table and insert into a partitioned Hive table
FROM ${INPUT_TABLE} ent
INSERT OVERWRITE TABLE mytable PARTITION(data)
SELECT ent.id, ent.data, ent.creationdate, ent.datelastupdated;

When I run only one coordinator to launch one workflow, the workflow and hive actions are completing successfully without any problems.

When multiple workflows are launched around the same time, the hive action stays in RUNNING for a long time.

If I look at the job syslogs, I see this:

2015-02-18 17:18:26,048 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskImpl: task_1423085109915_0223_m_000000 Task Transitioned from SCHEDULED to RUNNING
2015-02-18 17:18:26,586 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1423085109915_0223: ask=3 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:32768, vCores:-3> knownNMs=1
2015-02-18 17:18:27,677 INFO [Socket Reader #1 for port 38704] SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for job_1423085109915_0223 (auth:SIMPLE)
2015-02-18 17:18:27,696 INFO [IPC Server handler 0 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID : jvm_1423085109915_0223_m_000002 asked for a task
2015-02-18 17:18:27,697 INFO [IPC Server handler 0 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: JVM with ID: jvm_1423085109915_0223_m_000002 given task: attempt_1423085109915_0223_m_000000_0
2015-02-18 17:18:34,951 INFO [IPC Server handler 2 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:19:05,060 INFO [IPC Server handler 11 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:19:35,161 INFO [IPC Server handler 28 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:20:05,262 INFO [IPC Server handler 2 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:20:35,358 INFO [IPC Server handler 11 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:21:02,452 INFO [IPC Server handler 23 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:21:32,545 INFO [IPC Server handler 1 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0
2015-02-18 17:22:02,668 INFO [IPC Server handler 12 on 38704] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Progress of TaskAttempt attempt_1423085109915_0223_m_000000_0 is : 1.0 

It just kept printing the "Progress of TaskAttempt" over and over.

Our yarn-site.xml is configured to use this:

    <property>
      <name>yarn.resourcemanager.scheduler.class</name>
      <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>

Should I be using a different scheduler instead?

At this point I am not sure if the issue is in Oozie or Hive.

解决方案

It turns out this is the same issue as the HEART BEAT issue listed here:

Error on running multiple Workflow in OOZIE-4.1.0

After changing the scheduler to the FairScheduler as noted in the above post, I was able to run multiple workflows.

这篇关于Oozie工作流配置单元动作卡在RUNNING中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆