添加火花Oozie的共享库 [英] Add Spark to Oozie shared lib

查看:651
本文介绍了添加火花Oozie的共享库的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在默认情况下,Oozie的共享lib目录提供库蜂巢,猪和的map-reduce。如果我想上运行Oozie的星火作业时,它可能会更好星火lib中jar添加到了Oozie的共享库,而不是将它们复制到应用程序的lib目录中。

如何添加星火LIB罐(包括火花核心和它的依赖)到了Oozie的共享库?任何意见/答案是AP preciated。


解决方案

星火行动计划与Oozie的4.2.0发布,即使文档似乎有点落后。在这里看到相关的JIRA:
Oozie的JIRA - 添加火花行动执行人

Cloudera的CDH版本5.4有它已经不过,在这里看到的官方文档:
CDH 5.4 Oozie的文档 - Oozie的星火行动扩展

使用旧版本的Oozie的,罐子可以用各种方法共享。第一种方法可能最好的工作。完整的清单反正:


  

下面是各种方式包括与您的工作流程一个jar:


  
  

设置oozie.libpath = /路径/要/罐,另一/路径/要/在job.properties罐子。


  
  

    

如果你有许多工作流,都需要同样的罐子这是有用的;你可以把它放在一个地方,在HDFS与许多工作流中使用它。这些罐子将可在工作流程的所有操作。
    没有必要永远指向这在ShareLib位置。 (我看到了很多的工作流程。)Oozie的知道哪里ShareLib的,并会自动包含它,如果你设置oozie.use.system.libpath = true在job.properties。


  
  
  

创建一个名为LIB旁边HDFS你workflow.xml目录,并把罐子在那里。


  
  

    

如果你有一些罐子,你只需要一个工作流程,这非常有用。 Oozie的会自动将这些罐子可用于该工作流程的所有操作。


  
  
  

在指定的路径一个jar一个动作标签;你可以有多个标签。


  
  

    

如果您希望只为具体的行动,有些罐子不是在工作流程中的所有操作,这非常有用。
    缺点是,你有你的workflow.xml指定他们,所以如果你需要添加/删除一些罐子,你必须改变你的workflow.xml。


  
  
  

添加罐子到ShareLib(例如/用户/ Oozie的/股/ lib中/ lib_ /猪)


  
  

    

虽然这会工作,但不建议有两个原因:
    附加罐子将使用该ShareLib,这可能是意外的那些的工作流程和用户被包括在每一个工作流程。
    在升级ShareLib,你必须重新复制的附加罐子新ShareLib。


  

这是罗伯特·坎特的博客在这里引述:的如何到:使用的Apache Oozie的的ShareLib(CDH 5)

By default, Oozie shared lib directory provides libraries for Hive, Pig, and Map-Reduce. If I want to run Spark job on Oozie, it might be better to add Spark lib jars to Oozie's shared lib instead of copy them to app's lib directory.
How can I add Spark lib jars (including spark-core and its dependencies) to Oozie's shared lib? Any comment / answer is appreciated.

解决方案

Spark action is scheduled to be released with Oozie 4.2.0, even though the doc seems to be a bit behind. See related JIRA here : Oozie JIRA - Add spark action executor

Cloudera's release CDH 5.4 has it already though, see official doc here: CDH 5.4 oozie doc - Oozie Spark Action Extension

With the older version of Oozie, the jars could be shared with various approaches. The first approach may work the best. The complete listings anyway :

Below are the various ways to include a jar with your workflow:

Set oozie.libpath=/path/to/jars,another/path/to/jars in job.properties.

This is useful if you have many workflows that all need the same jar; you can put it in one place in HDFS and use it with many workflows. The jars will be available to all actions in that workflow. There is no need to ever point this at the ShareLib location. (I see that in a lot of workflows.) Oozie knows where the ShareLib is and will include it automatically if you set oozie.use.system.libpath=true in job.properties.

Create a directory named "lib" next to your workflow.xml in HDFS and put jars in there.

This is useful if you have some jars that you only need for one workflow. Oozie will automatically make those jars available to all actions in that workflow.

Specify the tag in an action with the path to a single jar; you can have multiple tags.

This is useful if you want some jars only for a specific action and not all actions in a workflow. The downside is that you have to specify them in your workflow.xml, so if you ever need to add/remove some jars, you have to change your workflow.xml.

Add jars to the ShareLib (e.g. /user/oozie/share/lib/lib_/pig)

While this will work, it’s not recommended for two reasons: The additional jars will be included with every workflow using that ShareLib, which may be unexpected to those workflows and users. When upgrading the ShareLib, you’ll have to recopy the additional jars to the new ShareLib.

quoted from Rober Kanter's blog here : How-to: Use the ShareLib in Apache Oozie (CDH 5)

这篇关于添加火花Oozie的共享库的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆