OOZIE:全局job-xml中引用的文件中定义的属性在workflow.xml中不可见 [英] OOZIE: properties defined in file referenced in global job-xml not visible in workflow.xml

查看:146
本文介绍了OOZIE:全局job-xml中引用的文件中定义的属性在workflow.xml中不可见的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是hadoop的新手,现在我只用一个sqoop动作测试简单的工作流程。它可以工作,如果我使用普通值 - 不是全局属性。



然而,我的目标是定义一些全局属性在 job-xml中引用的文件中



经过长时间的阅读和阅读很多文章,我仍然无法使它工作。
我怀疑一些简单的事情是错误的,因为我发现文章暗示此功能可以正常工作。



希望您能给我个提示。

>

总之:


  1. 我有属性 dbserver code> / code> dbuser dbpassword 定义在 / user / dm / conf / environment.xml

  2. 这些属性在我的 /user/dm/jobs/sqoop-test/workflow.xml EL_ERROR ,表示 dbserver >变量无法解析

以下是详细信息:

使用Cloudera 5.7.1发行版安装在单个节点上。
$ b

environment.xml 文件被上传到hdfs中,放入 / user / dm / conf 文件夹。
以下是内容:

 <?xml version =1.0encoding =UTF-8? > 
<配置>
<属性>
<名称> dbserver< / name>
<值> someserver< /值>
< / property>
<属性>
< name> dbuser< / name>
<值> someuser< /值>
< / property>
<属性>
<名称> dbpassword< / name>
<值> somepassword< /值>
< / property>
< / configuration>

workflow.xml 文件已上载到 /用户/分米/作业/ sqoop试验工作。这里是内容:

 <?xml version =1.0encoding =UTF-8?> 
< workflow-app xmlns =uri:oozie:workflow:0.4name =sqoop-test>
< global>
< job-xml> /user/dm/conf/env.xml< / job-xml>
< / global>
< start to =get-data/>
< action name =get-data>
< sqoop xmlns =uri:oozie:sqoop-action:0.3>
< job-tracker> $ {jobTracker}< / job-tracker>
< name-node> $ {nameNode}< / name-node>
<准备>
< delete path =$ {outputRootPath}/>
< / prepare>
< arg>汇入< / arg>
< arg> - 连接< / arg>
< arg> jdbc:sqlserver:// $ {dbserver}; user = $ {dbuser}; password = $ {dbpassword}< / arg>
< arg> - 查询< / arg>
< arg>从表格中选择col1,其中$ CONDITIONS< / arg>
< arg> - 分割< / arg>
< arg> main_id< / arg>
< arg> - target-dir< / arg>
< arg> $ {outputRootPath} / table< / arg>
< arg> -m< / arg>
< arg> 1< / arg>
< / sqoop>
< error to =kill/>
< / action>
< kill name =kill>
< message> Sqoop-test失败,错误消息[$ {wf:errorMessage()}]< / message>
< / kill>
< end name ='end'/>
< / workflow-app>

现在,我从命令行执行oozie工作流程:

  sudo -u dm oozie job --oozie http:// host:11000 / oozie -config job-config.xml -run 

其中我的job-config.xml如下所示:

 <?xml version =1.0encoding =UTF-8standalone =no?> 
<配置>
<属性>
< name> nameNode< / name>
<值> namenode:8020< /值>
< / property>
<属性>
<名称> jobTracker< / name>
< value> jobtracker:8021< /值>
< / property>
<属性>
< name> oozie.wf.application.path< / name>
< value> /user/dm/jobs/sqoop-test-job/workflow.xml< / value>
< / property>
<属性>
< name> outputRootPath< / name>
< value> / user / dm / data / sqoop-test< / value>
< / property>
< / configuration>


解决方案

好的,你犯了两个大错误。 p>

1。让我们从Oozie文档的某些部分(V4.2)快速解释一下开始吧。



工作流程功能规范




  • 有关于全局配置的第19部分

  • 包含有关核心动作类型的3.2.x节MapReduce,Pig,Java等。
  • XML模式规范清楚地显示了< global> 元素



Sqoop动作扩展名




  • 没有提及全局参数

  • 有自己的XML模式规范,它按照自己的速度发展,并且不是最新的wi th工作流程架构



换言之: Sqoop动作是插件 Oozie服务器担心。它不支持100%的更新功能,包括工作流程架构V0.4中引入的< global> 事物





2。您无法理解属性参数之间的区别 - 我不怪你,Oozie文档混淆不清。



参数由Oozie使用 在属性,命令等中运行文本替换。您可以在提交时使用 -config 将它们的值定义为文字 c $ c>参数,或者在工作流级别的< parameters> 元素中。通过文字我的意思是你不能在另一个参数中引用参数。该值是不可变的文本,原样使用。


$ b 属性传递给作业的Java属性 Oozie开始。您可以在提交时使用 -config 参数来设置它们 - 是的,这是一团糟,Oozie解析器必须确定哪些参数具有良好的属性,已知的属性名称,以及哪些只是params - 或位于< global> Workflow元素 - 但它们不会被传播扩展,正如您已经发现的难题 - 或者在< property> Action元素或XML文件内的 在全局工作流程级别或本地操作级别使用< job-xml> 元素定义。

需要注意两点:


  • 属性多次定义多个(冲突)值时,必须是一个优先规则,但我不太确定在Oozie中明确定义的
  • 属性可以使用参数动态定义它们的值和EL功能;但在< job-xml> 文件中定义的 properties 必须是文字,因为Oozie无法访问它们(它只是将文件内容传递给在运行时Hadoop 配置构造函数)



这是什么意思您?那么,您的脚本会在运行时通过< job-xml> 属性传递给运行Sqoop作业的JVM $ c>。

但您希望Oozie解析参数列表并在编译时使用它们来定义一些属性。这不会发生。


I'm new to hadoop and now I'm testing simple workflow with just single sqoop action. It works if I use plain values - not global properties.

My objective was however, to define some global properties in file referenced in job-xml tag in global section.

After long fight and reading many articles I still cannot make it work. I suspect some simple thing is wrong, since I found articles suggesting that this feature works fine.

Hopefully, you can give me a hint.

In short:

  1. I have properties, dbserver, dbuser and dbpassword defined in /user/dm/conf/environment.xml
  2. These properties are referenced in my /user/dm/jobs/sqoop-test/workflow.xml
  3. At runtime, I receive an EL_ERROR saying that dbserver variable cannot be resolved

Here are details:

I'm using Cloudera 5.7.1 distribution installed on single node.

environment.xml file was uploaded into hdfs into /user/dm/conf folder. Here is the content:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
            <property>
               <name>dbserver</name>
               <value>someserver</value>
            </property>
            <property>
               <name>dbuser</name>
               <value>someuser</value>
            </property>
            <property>
               <name>dbpassword</name>
               <value>somepassword</value>
            </property>    
</configuration>

workflow.xml file was uploaded into /user/dm/jobs/sqoop-test-job. Here is the content:

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="sqoop-test">
    <global>
        <job-xml>/user/dm/conf/env.xml</job-xml>
    </global>
    <start to="get-data"/>
    <action name="get-data">
        <sqoop xmlns="uri:oozie:sqoop-action:0.3">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>       
            <prepare>
                <delete path="${outputRootPath}"/>
            </prepare>
            <arg>import</arg>
            <arg>--connect</arg>
            <arg>jdbc:sqlserver://${dbserver};user=${dbuser};password=${dbpassword}</arg>
            <arg>--query</arg>
            <arg>select col1 from table where $CONDITIONS</arg>
            <arg>--split-by</arg>
            <arg>main_id</arg>
            <arg>--target-dir</arg>
            <arg>${outputRootPath}/table</arg>
            <arg>-m</arg>
            <arg>1</arg>
        </sqoop>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Sqoop-test failed, error message[${wf:errorMessage()}]</message>
    </kill>
    <end name='end'/>
</workflow-app>

Now, I execute oozie workflow from command line:

sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run

Where my job-config.xml is as follows:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
        <name>nameNode</name>
        <value>namenode:8020</value>
</property>
<property>
        <name>jobTracker</name>
        <value>jobtracker:8021</value>
</property>
<property>
        <name>oozie.wf.application.path</name>
        <value>/user/dm/jobs/sqoop-test-job/workflow.xml</value>
</property>
<property>
        <name>outputRootPath</name>
        <value>/user/dm/data/sqoop-test</value>
</property>
</configuration>

解决方案

OK, you are making two big mistakes.

1. Let's start with a quick exegesis of some parts of the Oozie documentation (V4.2)

Workflow Functional Specification

  • has a section 19 about Global Configuration
  • has sections 3.2.x about core Action types i.e. MapReduce, Pig, Java, etc.
  • the XML schema specification clearly shows the <global> element

Sqoop action Extension

  • does not make any mention of Global parameters
  • has its own XML schema specification, which evolves at its own pace, and is not up-to-date with the Workflow schema

In other words: the Sqoop action is a plug-in as far as the Oozie server is concerned. It does not support 100% of the "newer" functionalities, including the <global> thing that was introduced in Workflow schema V0.4


2. You don't understand the distinction between properties and parameters -- and I don't blame you, the Oozie docs are confused and confusing.

Parameters are used by Oozie to run text substitutions in properties, in commands, etc. You define their values as literals, either at submission time with the -config argument, or in the <parameters> element at Workflow level. And by "literal" I mean that you cannot make reference to a parameter in another parameter. The value is just immutable text, used as-is.

Properties are Java properties passed to the jobs that Oozie starts. You can set them either at submission time with the -config argument -- yes, it's a mess, the Oozie parser has to sort out which params have a well-known property name and which ones are just params -- or in the <global> Workflow element -- but they will not be propagated in all "extensions", as you have discovered the hard way -- or in the <property> Action element or inside an XML file defined with <job-xml> element, either at global Workflow level or at local Action level.

Two things to note:

  • when properties are defined multiple times with multiple (conflicting) values, there has to be a precedence rule but I'm not too sure
  • properties defined explicitly inside Oozie may have their value defined dynamically, using parameters and EL functions; but properties defined inside <job-xml> files must be literals because Oozie does not have access to them (it just passes the file content to the Hadoop Configuration constructor at run-time)

What does it mean for you? Well, your script tells Oozie to pass "hidden" properties to the JVM running the Sqoop job, at run-time, through a <job-xml>.
But you were expecting Oozie to parse a list of parameters and use them, at compile time, to define some properties. That won't happen.

这篇关于OOZIE:全局job-xml中引用的文件中定义的属性在workflow.xml中不可见的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆