OOZIE:在全局作业 xml 中引用的文件中定义的属性在工作流.xml 中不可见 [英] OOZIE: properties defined in file referenced in global job-xml not visible in workflow.xml
问题描述
我是 hadoop 的新手,现在我正在使用单个 sqoop 操作测试简单的工作流程.如果我使用普通值 - 而不是全局属性,它会起作用.
然而,我的目标是在全局部分的 job-xml
标记中引用的文件中定义一些全局属性.
经过长时间的斗争和阅读了很多文章,我仍然无法让它工作.我怀疑一些简单的事情是错误的,因为我发现一些文章表明此功能可以正常工作.
希望你能给我一个提示.
简而言之:
- 我在
/user/dm/conf/environment.xml 中定义了属性,
dbserver
、dbuser
和dbpassword
代码> - 在我的
/user/dm/jobs/sqoop-test/workflow.xml
中引用了这些属性 - 在运行时,我收到一个
EL_ERROR
说dbserver
变量无法解析
详情如下:
我使用的是安装在单节点上的 Cloudera 5.7.1 发行版.
environment.xml
文件已上传到 hdfs 的 /user/dm/conf
文件夹中.内容如下:
workflow.xml
文件已上传到 /user/dm/jobs/sqoop-test-job
.内容如下:
现在,我从命令行执行 oozie 工作流:
sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run
其中我的 job-config.xml 如下:
好吧,你犯了两个大错误.
1.让我们从 Oozie 文档 (V4.2) 的某些部分的快速解释开始
- 有关于全局配置的第 19 节
- 有关于核心操作类型的 3.2.x 部分,即 MapReduce、Pig、Java 等.
- XML 模式规范清楚地显示了
元素
- 没有提及全局参数
- 有自己的 XML 架构规范,该规范按照自己的节奏发展,并且不会与工作流架构保持同步
换句话说:Sqoop 操作是一个插件,就 Oozie 服务器而言.它不支持 100% 的较新"功能,包括在 Workflow schema V0.4 中引入的
东西
2.你不理解属性和参数之间的区别——我不怪你,Oozie 文档很困惑,令人困惑.
参数由 Oozie 用于在属性、命令等中运行文本替换.您将它们的值定义为文字,在提交时使用 -config
参数,或在工作流级别的
元素中.文字"是指您不能引用另一个参数中的参数.该值只是不可变的文本,按原样使用.
Properties 是传递给 Oozie 启动的作业的 Java 属性.您可以在提交时使用 -config
参数设置它们 -- 是的,这是一团糟,Oozie 解析器必须找出哪些参数具有众所周知的属性名称,哪些参数具有众所周知的属性名称只是参数 -- 或 <global>
工作流元素 -- 但它们不会在所有扩展"中传播,因为您已经发现了困难的方式-- 或在
Action 元素中或 XML 文件内 用
元素定义,在全局工作流级别或在本地操作级别.
注意两点:
- 当属性用多个(冲突)值定义多次时,必须有一个优先规则,但我不太确定 在 Oozie 中明确定义的
- 属性可以使用参数和 EL 函数动态定义它们的值;但是在
文件中定义的 properties 必须是文字,因为 Oozie 无权访问它们(它只是将文件内容传递给 HadoopConfiguration
运行时构造函数)
这对你来说意味着什么?好吧,您的脚本告诉 Oozie 在运行时通过
将隐藏的"属性传递给运行 Sqoop 作业的 JVM.
但是您希望 Oozie 解析一个参数列表,并在编译时使用它们来定义一些属性.那不会发生.
I'm new to hadoop and now I'm testing simple workflow with just single sqoop action. It works if I use plain values - not global properties.
My objective was however, to define some global properties in file referenced in job-xml
tag in global section.
After long fight and reading many articles I still cannot make it work. I suspect some simple thing is wrong, since I found articles suggesting that this feature works fine.
Hopefully, you can give me a hint.
In short:
- I have properties,
dbserver
,dbuser
anddbpassword
defined in/user/dm/conf/environment.xml
- These properties are referenced in my
/user/dm/jobs/sqoop-test/workflow.xml
- At runtime, I receive an
EL_ERROR
saying thatdbserver
variable cannot be resolved
Here are details:
I'm using Cloudera 5.7.1 distribution installed on single node.
environment.xml
file was uploaded into hdfs into /user/dm/conf
folder.
Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>dbserver</name>
<value>someserver</value>
</property>
<property>
<name>dbuser</name>
<value>someuser</value>
</property>
<property>
<name>dbpassword</name>
<value>somepassword</value>
</property>
</configuration>
workflow.xml
file was uploaded into /user/dm/jobs/sqoop-test-job
. Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="sqoop-test">
<global>
<job-xml>/user/dm/conf/env.xml</job-xml>
</global>
<start to="get-data"/>
<action name="get-data">
<sqoop xmlns="uri:oozie:sqoop-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputRootPath}"/>
</prepare>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:sqlserver://${dbserver};user=${dbuser};password=${dbpassword}</arg>
<arg>--query</arg>
<arg>select col1 from table where $CONDITIONS</arg>
<arg>--split-by</arg>
<arg>main_id</arg>
<arg>--target-dir</arg>
<arg>${outputRootPath}/table</arg>
<arg>-m</arg>
<arg>1</arg>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Sqoop-test failed, error message[${wf:errorMessage()}]</message>
</kill>
<end name='end'/>
</workflow-app>
Now, I execute oozie workflow from command line:
sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run
Where my job-config.xml is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
<name>nameNode</name>
<value>namenode:8020</value>
</property>
<property>
<name>jobTracker</name>
<value>jobtracker:8021</value>
</property>
<property>
<name>oozie.wf.application.path</name>
<value>/user/dm/jobs/sqoop-test-job/workflow.xml</value>
</property>
<property>
<name>outputRootPath</name>
<value>/user/dm/data/sqoop-test</value>
</property>
</configuration>
OK, you are making two big mistakes.
1. Let's start with a quick exegesis of some parts of the Oozie documentation (V4.2)
Workflow Functional Specification
- has a section 19 about Global Configuration
- has sections 3.2.x about core Action types i.e. MapReduce, Pig, Java, etc.
- the XML schema specification clearly shows the
<global>
element
- does not make any mention of Global parameters
- has its own XML schema specification, which evolves at its own pace, and is not up-to-date with the Workflow schema
In other words: the Sqoop action is a plug-in as far as the Oozie server is concerned. It does not support 100% of the "newer" functionalities, including the <global>
thing that was introduced in Workflow schema V0.4
2. You don't understand the distinction between properties and parameters -- and I don't blame you, the Oozie docs are confused and confusing.
Parameters are used by Oozie to run text substitutions in properties, in commands, etc. You define their values as literals, either at submission time with the -config
argument, or in the <parameters>
element at Workflow level. And by "literal" I mean that you cannot make reference to a parameter in another parameter. The value is just immutable text, used as-is.
Properties are Java properties passed to the jobs that Oozie starts. You can set them either at submission time with the -config
argument -- yes, it's a mess, the Oozie parser has to sort out which params have a well-known property name and which ones are just params -- or in the <global>
Workflow element -- but they will not be propagated in all "extensions", as you have discovered the hard way -- or in the <property>
Action element or inside an XML file defined with <job-xml>
element, either at global Workflow level or at local Action level.
Two things to note:
- when properties are defined multiple times with multiple (conflicting) values, there has to be a precedence rule but I'm not too sure
- properties defined explicitly inside Oozie may have their value defined dynamically, using parameters and EL functions; but properties defined inside
<job-xml>
files must be literals because Oozie does not have access to them (it just passes the file content to the HadoopConfiguration
constructor at run-time)
What does it mean for you? Well, your script tells Oozie to pass "hidden" properties to the JVM running the Sqoop job, at run-time, through a <job-xml>
.
But you were expecting Oozie to parse a list of parameters and use them, at compile time, to define some properties. That won't happen.
这篇关于OOZIE:在全局作业 xml 中引用的文件中定义的属性在工作流.xml 中不可见的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!