推出使用Oozie的工作流中的星火计划 [英] launching a spark program using oozie workflow

查看:137
本文介绍了推出使用Oozie的工作流中的星火计划的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用火花包Scala程序工作。
目前我使用运行bash命令从网关程序:
/家庭/火花/斌/火花提交--master纱线集群--classcom.xxx.yyy.zzz--driver-java的选项-Dyyy.num = 5a.jar文件ARG1 ARG2

I am working with a scala program using spark packages. Currently I run the program using the bash command from the gateway: /homes/spark/bin/spark-submit --master yarn-cluster --class "com.xxx.yyy.zzz" --driver-java-options "-Dyyy.num=5" a.jar arg1 arg2

我想开始使用Oozie的运行此作业。我有几个挫折:

I would like to start using oozie for running this job. I have a few setbacks:

应该在哪里我把火花提交可执行?在HFS?
如何定义的火花行动?在--driver-java的选项应该在哪里出现?
在Oozie的动作应该怎么样子?是它类似于所述一个出现<一href=\"http://stackoverflow.com/questions/29098841/using-apache-oozie-ssh-actions-to-execute-spark-submit-why-does-the-spark-appli%20\">here?

Where should I put the spark-submit executable? on the hfs? How do I define the spark action? where should the --driver-java-options appear? How should the oozie action look like? is it similar to the one appearing here?

推荐答案

如果您有Oozie的新版本不够,您可以使用了Oozie的火花任务:

If you have a new enough version of oozie you can use oozie's spark task:

<一个href=\"https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd\">https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd

否则,你需要执行Java任务,将调用的火花。是这样的:

Otherwise you need to execute a java task that will call spark. Something like:

   <java>
        <main-class>org.apache.spark.deploy.SparkSubmit</main-class>

        <arg>--class</arg>
        <arg>${spark_main_class}</arg> -> this is the class com.xxx.yyy.zzz

        <arg>--deploy-mode</arg>
        <arg>cluster</arg>

        <arg>--master</arg>
        <arg>yarn</arg>

        <arg>--queue</arg>
        <arg>${queue_name}</arg> -> depends on your oozie config

        <arg>--num-executors</arg>
        <arg>${spark_num_executors}</arg>

        <arg>--executor-cores</arg>
        <arg>${spark_executor_cores}</arg>

        <arg>${spark_app_file}</arg> -> jar that contains your spark job, written in scala

        <arg>${input}</arg> -> some arg 
        <arg>${output}</arg>-> some other arg

        <file>${spark_app_file}</file>

        <file>${name_node}/user/spark/share/lib/spark-assembly.jar</file>
    </java>

这篇关于推出使用Oozie的工作流中的星火计划的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆