OOZIE:在全局作业 xml 中引用的文件中定义的属性在工作流.xml 中不可见 [英] OOZIE: properties defined in file referenced in global job-xml not visible in workflow.xml

查看:27
本文介绍了OOZIE:在全局作业 xml 中引用的文件中定义的属性在工作流.xml 中不可见的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是 hadoop 的新手,现在我正在使用单个 sqoop 操作测试简单的工作流程.如果我使用普通值 - 而不是全局属性,它会起作用.

然而,我的目标是在全局部分的 job-xml 标记中引用的文件中定义一些全局属性.

经过长时间的斗争和阅读了很多文章,我仍然无法让它工作.我怀疑一些简单的事情是错误的,因为我发现一些文章表明此功能可以正常工作.

希望你能给我一个提示.

简而言之:

  1. 我在 /user/dm/conf/environment.xml 中定义了属性,dbserverdbuserdbpassword代码>
  2. 在我的/user/dm/jobs/sqoop-test/workflow.xml
  3. 中引用了这些属性
  4. 在运行时,我收到一个 EL_ERRORdbserver 变量无法解析

详情如下:

我使用的是安装在单节点上的 Cloudera 5.7.1 发行版.

environment.xml 文件已上传到 hdfs 的 /user/dm/conf 文件夹中.内容如下:

workflow.xml 文件已上传到 /user/dm/jobs/sqoop-test-job.内容如下:

现在,我从命令行执行 oozie 工作流:

sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run

其中我的 job-config.xml 如下:

解决方案

好吧,你犯了两个大错误.

1.让我们从 Oozie 文档 (V4.2) 的某些部分的快速解释开始

工作流功能规范

  • 有关于全局配置的第 19 节
  • 有关于核心操作类型的 3.2.x 部分,即 MapReduce、Pig、Java 等.
  • XML 模式规范清楚地显示了 元素

Sqoop 操作 扩展

  • 没有提及全局参数
  • 有自己的 XML 架构规范,该规范按照自己的节奏发展,并且不会与工作流架构保持同步

换句话说:Sqoop 操作是一个插件,就 Oozie 服务器而言.它不支持 100% 的较新"功能,包括在 Workflow schema V0.4 中引入的 东西


2.你不理解属性参数之间的区别——我不怪你,Oozie 文档很困惑,令人困惑.

参数由 Oozie 用于在属性、命令等中运行文本替换.您将它们的值定义为文字,在提交时使用 -config 参数,或在工作流级别的 元素中.文字"是指您不能引用另一个参数中的参数.该值只是不可变的文本,按原样使用.

Properties 是传递给 Oozie 启动的作业的 Java 属性.您可以在提交时使用 -config 参数设置它们 -- 是的,这是一团糟,Oozie 解析器必须找出哪些参数具有众所周知的属性名称,哪些参数具有众所周知的属性名称只是参数 --<global> 工作流元素 -- 但它们不会在所有扩展"中传播,因为您已经发现了困难的方式-- 或在 Action 元素中或 XML 文件内 元素定义,在全局工作流级别或在本地操作级别.

注意两点:

  • 属性用多个(冲突)值定义多次时,必须有一个优先规则,但我不太确定
  • 在 Oozie 中明确定义的
  • 属性可以使用参数和 EL 函数动态定义它们的值;但是在 文件中定义的 properties 必须是文字,因为 Oozie 无权访问它们(它只是将文件内容传递给 Hadoop Configuration 运行时构造函数)

这对你来说意味着什么?好吧,您的脚本告诉 Oozie 在运行时通过 将隐藏的"属性传递给运行 Sqoop 作业的 JVM.
但是您希望 Oozie 解析一个参数列表,并在编译时使用它们来定义一些属性.那不会发生.

I'm new to hadoop and now I'm testing simple workflow with just single sqoop action. It works if I use plain values - not global properties.

My objective was however, to define some global properties in file referenced in job-xml tag in global section.

After long fight and reading many articles I still cannot make it work. I suspect some simple thing is wrong, since I found articles suggesting that this feature works fine.

Hopefully, you can give me a hint.

In short:

  1. I have properties, dbserver, dbuser and dbpassword defined in /user/dm/conf/environment.xml
  2. These properties are referenced in my /user/dm/jobs/sqoop-test/workflow.xml
  3. At runtime, I receive an EL_ERROR saying that dbserver variable cannot be resolved

Here are details:

I'm using Cloudera 5.7.1 distribution installed on single node.

environment.xml file was uploaded into hdfs into /user/dm/conf folder. Here is the content:

<?xml version="1.0" encoding="UTF-8"?>
<configuration>
            <property>
               <name>dbserver</name>
               <value>someserver</value>
            </property>
            <property>
               <name>dbuser</name>
               <value>someuser</value>
            </property>
            <property>
               <name>dbpassword</name>
               <value>somepassword</value>
            </property>    
</configuration>

workflow.xml file was uploaded into /user/dm/jobs/sqoop-test-job. Here is the content:

<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="sqoop-test">
    <global>
        <job-xml>/user/dm/conf/env.xml</job-xml>
    </global>
    <start to="get-data"/>
    <action name="get-data">
        <sqoop xmlns="uri:oozie:sqoop-action:0.3">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>       
            <prepare>
                <delete path="${outputRootPath}"/>
            </prepare>
            <arg>import</arg>
            <arg>--connect</arg>
            <arg>jdbc:sqlserver://${dbserver};user=${dbuser};password=${dbpassword}</arg>
            <arg>--query</arg>
            <arg>select col1 from table where $CONDITIONS</arg>
            <arg>--split-by</arg>
            <arg>main_id</arg>
            <arg>--target-dir</arg>
            <arg>${outputRootPath}/table</arg>
            <arg>-m</arg>
            <arg>1</arg>
        </sqoop>
        <ok to="end"/>
        <error to="kill"/>
    </action>
    <kill name="kill">
        <message>Sqoop-test failed, error message[${wf:errorMessage()}]</message>
    </kill>
    <end name='end'/>
</workflow-app>

Now, I execute oozie workflow from command line:

sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run

Where my job-config.xml is as follows:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
        <name>nameNode</name>
        <value>namenode:8020</value>
</property>
<property>
        <name>jobTracker</name>
        <value>jobtracker:8021</value>
</property>
<property>
        <name>oozie.wf.application.path</name>
        <value>/user/dm/jobs/sqoop-test-job/workflow.xml</value>
</property>
<property>
        <name>outputRootPath</name>
        <value>/user/dm/data/sqoop-test</value>
</property>
</configuration>

解决方案

OK, you are making two big mistakes.

1. Let's start with a quick exegesis of some parts of the Oozie documentation (V4.2)

Workflow Functional Specification

  • has a section 19 about Global Configuration
  • has sections 3.2.x about core Action types i.e. MapReduce, Pig, Java, etc.
  • the XML schema specification clearly shows the <global> element

Sqoop action Extension

  • does not make any mention of Global parameters
  • has its own XML schema specification, which evolves at its own pace, and is not up-to-date with the Workflow schema

In other words: the Sqoop action is a plug-in as far as the Oozie server is concerned. It does not support 100% of the "newer" functionalities, including the <global> thing that was introduced in Workflow schema V0.4


2. You don't understand the distinction between properties and parameters -- and I don't blame you, the Oozie docs are confused and confusing.

Parameters are used by Oozie to run text substitutions in properties, in commands, etc. You define their values as literals, either at submission time with the -config argument, or in the <parameters> element at Workflow level. And by "literal" I mean that you cannot make reference to a parameter in another parameter. The value is just immutable text, used as-is.

Properties are Java properties passed to the jobs that Oozie starts. You can set them either at submission time with the -config argument -- yes, it's a mess, the Oozie parser has to sort out which params have a well-known property name and which ones are just params -- or in the <global> Workflow element -- but they will not be propagated in all "extensions", as you have discovered the hard way -- or in the <property> Action element or inside an XML file defined with <job-xml> element, either at global Workflow level or at local Action level.

Two things to note:

  • when properties are defined multiple times with multiple (conflicting) values, there has to be a precedence rule but I'm not too sure
  • properties defined explicitly inside Oozie may have their value defined dynamically, using parameters and EL functions; but properties defined inside <job-xml> files must be literals because Oozie does not have access to them (it just passes the file content to the Hadoop Configuration constructor at run-time)

What does it mean for you? Well, your script tells Oozie to pass "hidden" properties to the JVM running the Sqoop job, at run-time, through a <job-xml>.
But you were expecting Oozie to parse a list of parameters and use them, at compile time, to define some properties. That won't happen.

这篇关于OOZIE:在全局作业 xml 中引用的文件中定义的属性在工作流.xml 中不可见的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆