Oozie:从Oozie启动Map-Reduce< java>行动? [英] Oozie: Launch Map-Reduce from Oozie <java> action?

查看:123
本文介绍了Oozie:从Oozie启动Map-Reduce< java>行动?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用< java> 动作在Oozie工作流中执行Map-Reduce任务。



O'Reilley的(Islam and Srinivasan 2015)指出:


虽然不推荐Java操作可用于运行Hadoop MapReduce作业,因为MapReduce作业毕竟是Java程序。被调用的主类可以是Hadoop MapReduce驱动程序,可以调用Hadoop API来运行MapReduce作业。在这种模式下,Hadoop根据需要生成更多映射器和缩减器,并在群集上运行它们。


然而,我没有成功使用这种方法。



工作流程中的操作定义如下所示:

 < ; Java和GT; 
<! - 全局配置中的Namenode等 - >
<准备>
< delete path =$ {transformOut}/>
< / prepare>
<配置>
<属性>
< name> mapreduce.job.queuename< / name>
<值>默认值< /值>
< / property>
< / configuration>
< main-class> package.containing.TransformTool< / main-class>
< arg> $ {transformIn}< / arg>
< arg> $ {transformOut}< / arg>
< file> $ {avroJar}< / file>
< file> $ {avroMapReduceJar}< / file>
< / java>

工具实现的 main()实现外观像这样:

  public static void main(String [] args)throws Exception {
int res = ToolRunner.run (新的TransformTool(),args);
if(res!= 0){
抛出新异常(Error running MapReduce。);
}
}

工作流将崩溃,出现运行MapReduce错误每次都超出例外; 我如何获得MapReduce的输出来诊断问题?使用工具来运行MapReduce应用程序是否存在问题?我使用错误的API调用吗?



我非常不愿意使用Oozie < map-reduce> 动作,因为工作流中的每个动作都依赖于几个单独版本的AVRO模式。



这里有什么问题?我为任务使用'new' mapreduce API。



感谢您的帮助。

解决方案



返回基本信息。



由于您不在乎提及哪个版本的Hadoop以及哪个版本的Oozie你正在使用,我会假设一个最近的设置(例如Hadoop 2.7 w / TimelineServer和Oozie 4.2)。而且由于你没有提到你使用的是哪一种接口(命令行本地Oozie / Yarn UI?Hue?),我将举几个例子使用good'old'CLI。

> oozie jobs -localtime -len 10 -filter name = CrazyExperiment



显示CrazyExperiment工作流程的最后10次执行,以便您可以注入

>在下一个命令中适当的作业ID。 oozie job -info 0000005-151217173344062-oozie-oozi-W



从Oozie的角度显示执行的状态。如果您的Java操作停留在PREP模式下,则Oozie无法将其提交给YARN;否则您会在外部ID下找到类似于 job_1449681681381_5858 的内容。但要小心! 作业前缀是遗留问题;实际的YARN ID是 application_1449681681381_5858



> oozie job -log 0000005-151217173344062-oozie-oozi-W



显示Oozie日志,如预期的那样。 b
$ b

>纱线日志-applicationId application_1449681681381_5858



显示AppMaster(容器#1)和Java动作启动器(容器#2)的合并日志 - 执行结束。 Launcher的 stdout 日志包含一大堆Oozie调试工具,真正的stdout位于最底层。



如果您的Java操作成功地创建了另一个YARN作业,并且您小心地显示孩子的应用程序ID,您应该能够在那里检索它并运行另一个 yarn logs 命令。 / p>

享受接下来的5天调试; - )

I am trying to execute a Map-Reduce task in an Oozie workflow using a <java> action.

O'Reilley's Apache Oozie (Islam and Srinivasan 2015) notes that:

While it’s not recommended, Java action can be used to run Hadoop MapReduce jobs because MapReduce jobs are nothing but Java programs after all. The main class invoked can be a Hadoop MapReduce driver and can call Hadoop APIs to run a MapReduce job. In that mode, Hadoop spawns more mappers and reducers as required and runs them on the cluster.

However, I'm not having success using this approach.

The action definition in the workflow looks like this:

<java>
    <!-- Namenode etc. in global configuration -->
    <prepare>
      <delete path="${transformOut}" />
    </prepare>
    <configuration>
        <property>
            <name>mapreduce.job.queuename</name>
            <value>default</value>
        </property>
    </configuration>
    <main-class>package.containing.TransformTool</main-class>
    <arg>${transformIn}</arg>
    <arg>${transformOut}</arg>
    <file>${avroJar}</file>
    <file>${avroMapReduceJar}</file>
</java>

The Tool implementation's main() implementation looks like this:

public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new TransformTool(), args);
    if (res != 0) {
        throw new Exception("Error running MapReduce.");
    }
}

The workflow will crash with the "Error running MapReduce" exception above every time; how do I get the output of the MapReduce to diagnose the problem? Is there a problem with using this Tool to run a MapReduce application? Am I using the wrong API calls?

I am extremely disinclined to use the Oozie <map-reduce> action, as each action in the workflow relies on several separately versioned AVRO schemas.

What's the issue here? I am using the 'new' mapreduce API for the task.

Thanks for any help.

解决方案

> how do I get the output of the MapReduce...

Back to the basics.

Since you don't care to mention which version of Hadoop and which version of Oozie you are using, I will assume a "recent" setup (e.g. Hadoop 2.7 w/ TimelineServer and Oozie 4.2). And since you don't mention which kind of interface you use (command-line? native Oozie/Yarn UI? Hue?) I will give a few examples using good'old'CLI.

> oozie jobs -localtime -len 10 -filter name=CrazyExperiment

Shows the last 10 executions of "CrazyExperiment" workflow, so that you can inject the appropriate "Job ID" in next commands.

> oozie job -info 0000005-151217173344062-oozie-oozi-W

Shows the status of that execution, from Oozie point of view. If your Java action is stuck in PREP mode, then Oozie failed to submit it to YARN; otherwise you will find something like job_1449681681381_5858 under "External ID". But beware! The job prefix is a legacy thing; the actual YARN ID is application_1449681681381_5858.

> oozie job -log 0000005-151217173344062-oozie-oozi-W

Shows the Oozie log, as could be expected.

> yarn logs -applicationId application_1449681681381_5858

Shows the consolidated logs for AppMaster (container #1) and Java action Launcher (container #2) -- after execution is over. The stdout log for Launcher contains a whole shitload of Oozie debug stuff, the real stdout is at the very bottom.

In case your Java action successfully spawned another YARN job, and you were careful to display the child "application ID", you should be able to retrieve it there and run another yarn logs command against it.

Enjoy your next 5 days of debugging ;-)

这篇关于Oozie:从Oozie启动Map-Reduce&lt; java&gt;行动?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆