Oozie:从 Oozie <java> 启动 Map-Reduce行动? [英] Oozie: Launch Map-Reduce from Oozie <java> action?

查看:18
本文介绍了Oozie:从 Oozie <java> 启动 Map-Reduce行动?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用 操作在 Oozie 工作流中执行 Map-Reduce 任务.

I am trying to execute a Map-Reduce task in an Oozie workflow using a <java> action.

O'Reilley 的 Apache Oozie(Islam and Srinivasan 2015)指出:

O'Reilley's Apache Oozie (Islam and Srinivasan 2015) notes that:

虽然不推荐,但可以使用 Java action 来运行 Hadoop MapReduce 作业,因为 MapReduce 作业毕竟只是 Java 程序.调用的主类可以是 Hadoop MapReduce 驱动程序,并且可以调用 Hadoop API 来运行 MapReduce 作业.在这种模式下,Hadoop 会根据需要生成更多的映射器和化简器,并在集群上运行它们.

While it’s not recommended, Java action can be used to run Hadoop MapReduce jobs because MapReduce jobs are nothing but Java programs after all. The main class invoked can be a Hadoop MapReduce driver and can call Hadoop APIs to run a MapReduce job. In that mode, Hadoop spawns more mappers and reducers as required and runs them on the cluster.

但是,我使用这种方法并没有成功.

However, I'm not having success using this approach.

工作流中的操作定义如下所示:

The action definition in the workflow looks like this:

<java>
    <!-- Namenode etc. in global configuration -->
    <prepare>
      <delete path="${transformOut}" />
    </prepare>
    <configuration>
        <property>
            <name>mapreduce.job.queuename</name>
            <value>default</value>
        </property>
    </configuration>
    <main-class>package.containing.TransformTool</main-class>
    <arg>${transformIn}</arg>
    <arg>${transformOut}</arg>
    <file>${avroJar}</file>
    <file>${avroMapReduceJar}</file>
</java>

工具实现的 main() 实现如下所示:

The Tool implementation's main() implementation looks like this:

public static void main(String[] args) throws Exception {
    int res = ToolRunner.run(new TransformTool(), args);
    if (res != 0) {
        throw new Exception("Error running MapReduce.");
    }
}

工作流每次都会因为上面的Error running MapReduce"异常而崩溃;如何获取 MapReduce 的输出来诊断问题?使用此Tool 运行 MapReduce 应用程序是否有问题?我是否使用了错误的 API 调用?

The workflow will crash with the "Error running MapReduce" exception above every time; how do I get the output of the MapReduce to diagnose the problem? Is there a problem with using this Tool to run a MapReduce application? Am I using the wrong API calls?

我非常不愿意使用 Oozie <map-reduce> 操作,因为工作流中的每个操作都依赖于几个单独版本化的 AVRO 模式.

I am extremely disinclined to use the Oozie <map-reduce> action, as each action in the workflow relies on several separately versioned AVRO schemas.

这里有什么问题?我正在为任务使用新"mapreduce API.

What's the issue here? I am using the 'new' mapreduce API for the task.

感谢您的帮助.

推荐答案

> 我如何获得 MapReduce 的输出...

回到基础.

由于您不关心提及您使用的是哪个版本的 Hadoop 和哪个版本的 Oozie,我将假设一个最近"的设置(例如 Hadoop 2.7 w/TimelineServer 和 Oozie 4.2).由于您没有提到您使用哪种界面(命令行?原生 Oozie/Yarn UI?Hue?),我将使用旧的 CLI 举几个例子.

Since you don't care to mention which version of Hadoop and which version of Oozie you are using, I will assume a "recent" setup (e.g. Hadoop 2.7 w/ TimelineServer and Oozie 4.2). And since you don't mention which kind of interface you use (command-line? native Oozie/Yarn UI? Hue?) I will give a few examples using good'old'CLI.

<代码>>oozie 工作 -localtime -len 10 -filter name=CrazyExperiment

显示CrazyExperiment"工作流的最后 10 次执行,以便您可以在接下来的命令中注入适当的作业 ID".

Shows the last 10 executions of "CrazyExperiment" workflow, so that you can inject the appropriate "Job ID" in next commands.

<代码>>oozie 工作-信息 0000005-151217173344062-oozie-oozi-W

从 Oozie 的角度显示该执行的状态.如果你的 Java action 卡在 PREP 模式,那么 Oozie 提交给 YARN 失败;否则,您会在外部 ID"下找到类似 job_1449681681381_5858 的内容.但要小心!job 前缀是一个遗留的东西;实际的 YARN ID 是 application_1449681681381_5858.

Shows the status of that execution, from Oozie point of view. If your Java action is stuck in PREP mode, then Oozie failed to submit it to YARN; otherwise you will find something like job_1449681681381_5858 under "External ID". But beware! The job prefix is a legacy thing; the actual YARN ID is application_1449681681381_5858.

<代码>>oozie job -log 0000005-151217173344062-oozie-oozi-W

按预期显示 Oozie 日志.

Shows the Oozie log, as could be expected.

<代码>>纱线日志-applicationId application_1449681681381_5858

显示 AppMaster(容器 #1)和 Java 操作启动器(容器 #2)的合并日志——在执行结束后.Launcher 的 stdout 日志包含一大堆 Oozie 调试内容,真正的标准输出在最底部.

Shows the consolidated logs for AppMaster (container #1) and Java action Launcher (container #2) -- after execution is over. The stdout log for Launcher contains a whole shitload of Oozie debug stuff, the real stdout is at the very bottom.

如果您的 Java 操作成功生成另一个 YARN 作业,并且您小心地显示子应用程序 ID",您应该能够在那里检索它并针对它运行另一个 yarn logs 命令.

In case your Java action successfully spawned another YARN job, and you were careful to display the child "application ID", you should be able to retrieve it there and run another yarn logs command against it.

享受接下来的 5 天调试 ;-)

Enjoy your next 5 days of debugging ;-)

这篇关于Oozie:从 Oozie &lt;java&gt; 启动 Map-Reduce行动?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆