OOZIE:全局job-xml中引用的文件中定义的属性在workflow.xml中不可见 [英] OOZIE: properties defined in file referenced in global job-xml not visible in workflow.xml
问题描述
我是hadoop的新手,现在我只用一个sqoop动作测试简单的工作流程。它可以工作,如果我使用普通值 - 不是全局属性。
然而,我的目标是定义一些全局属性在 job-xml中引用的文件中
经过长时间的阅读和阅读很多文章,我仍然无法使它工作。
我怀疑一些简单的事情是错误的,因为我发现文章暗示此功能可以正常工作。
希望您能给我个提示。
>总之:
- 我有属性
dbserver code> / code> dbuser
和dbpassword
定义在/ user / dm / conf / environment.xml
- 这些属性在我的
/user/dm/jobs/sqoop-test/workflow.xml $在运行时,我收到一个
EL_ERROR
,表示dbserver
>变量无法解析
以下是详细信息:
使用Cloudera 5.7.1发行版安装在单个节点上。
$ b
environment.xml
文件被上传到hdfs中,放入 / user / dm / conf
文件夹。
以下是内容:
<?xml version =1.0encoding =UTF-8? >
<配置>
<属性>
<名称> dbserver< / name>
<值> someserver< /值>
< / property>
<属性>
< name> dbuser< / name>
<值> someuser< /值>
< / property>
<属性>
<名称> dbpassword< / name>
<值> somepassword< /值>
< / property>
< / configuration>
workflow.xml
文件已上载到 /用户/分米/作业/ sqoop试验工作
。这里是内容:
<?xml version =1.0encoding =UTF-8?>
< workflow-app xmlns =uri:oozie:workflow:0.4name =sqoop-test>
< global>
< job-xml> /user/dm/conf/env.xml< / job-xml>
< / global>
< start to =get-data/>
< action name =get-data>
< sqoop xmlns =uri:oozie:sqoop-action:0.3>
< job-tracker> $ {jobTracker}< / job-tracker>
< name-node> $ {nameNode}< / name-node>
<准备>
< delete path =$ {outputRootPath}/>
< / prepare>
< arg>汇入< / arg>
< arg> - 连接< / arg>
< arg> jdbc:sqlserver:// $ {dbserver}; user = $ {dbuser}; password = $ {dbpassword}< / arg>
< arg> - 查询< / arg>
< arg>从表格中选择col1,其中$ CONDITIONS< / arg>
< arg> - 分割< / arg>
< arg> main_id< / arg>
< arg> - target-dir< / arg>
< arg> $ {outputRootPath} / table< / arg>
< arg> -m< / arg>
< arg> 1< / arg>
< / sqoop>
< error to =kill/>
< / action>
< kill name =kill>
< message> Sqoop-test失败,错误消息[$ {wf:errorMessage()}]< / message>
< / kill>
< end name ='end'/>
< / workflow-app>
现在,我从命令行执行oozie工作流程:
sudo -u dm oozie job --oozie http:// host:11000 / oozie -config job-config.xml -run
其中我的job-config.xml如下所示:
<?xml version =1.0encoding =UTF-8standalone =no?>
<配置>
<属性>
< name> nameNode< / name>
<值> namenode:8020< /值>
< / property>
<属性>
<名称> jobTracker< / name>
< value> jobtracker:8021< /值>
< / property>
<属性>
< name> oozie.wf.application.path< / name>
< value> /user/dm/jobs/sqoop-test-job/workflow.xml< / value>
< / property>
<属性>
< name> outputRootPath< / name>
< value> / user / dm / data / sqoop-test< / value>
< / property>
< / configuration>
好的,你犯了两个大错误。 p>
1。让我们从Oozie文档的某些部分(V4.2)快速解释一下开始吧。
- 有关于全局配置的第19部分
- 包含有关核心动作类型的3.2.x节MapReduce,Pig,Java等。
- XML模式规范清楚地显示了
< global>
元素
- 没有提及全局参数
- 有自己的XML模式规范,它按照自己的速度发展,并且不是最新的wi th工作流程架构
换言之: Sqoop动作是插件 Oozie服务器担心。它不支持100%的更新功能,包括工作流程架构V0.4中引入的< global>
事物
2。您无法理解属性和参数之间的区别 - 我不怪你,Oozie文档混淆不清。
参数由Oozie使用 在属性,命令等中运行文本替换。您可以在提交时使用 -config $>将它们的值定义为文字 c $ c>参数,或者在工作流级别的
< parameters>
元素中。通过文字我的意思是你不能在另一个参数中引用参数。该值是不可变的文本,原样使用。
$ b 属性是传递给作业的Java属性 Oozie开始。您可以在提交时使用
-config
参数来设置它们 - 是的,这是一团糟,Oozie解析器必须确定哪些参数具有良好的属性,已知的属性名称,以及哪些只是params - 或位于< global>
Workflow元素 - 但它们不会被传播扩展,正如您已经发现的难题 - 或者在< property>
Action元素或XML文件内的 在全局工作流程级别或本地操作级别使用< job-xml>
元素定义。 需要注意两点:
< job-xml>
文件中定义的 properties 必须是文字,因为Oozie无法访问它们(它只是将文件内容传递给在运行时Hadoop 配置
构造函数)
这是什么意思您?那么,您的脚本会在运行时通过< job-xml> $ c,告诉Oozie将hidden属性传递给运行Sqoop作业的JVM $ c>。
但您希望Oozie解析参数列表并在编译时使用它们来定义一些属性。这不会发生。
I'm new to hadoop and now I'm testing simple workflow with just single sqoop action. It works if I use plain values - not global properties.
My objective was however, to define some global properties in file referenced in job-xml
tag in global section.
After long fight and reading many articles I still cannot make it work. I suspect some simple thing is wrong, since I found articles suggesting that this feature works fine.
Hopefully, you can give me a hint.
In short:
- I have properties,
dbserver
,dbuser
anddbpassword
defined in/user/dm/conf/environment.xml
- These properties are referenced in my
/user/dm/jobs/sqoop-test/workflow.xml
- At runtime, I receive an
EL_ERROR
saying thatdbserver
variable cannot be resolved
Here are details:
I'm using Cloudera 5.7.1 distribution installed on single node.
environment.xml
file was uploaded into hdfs into /user/dm/conf
folder.
Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>dbserver</name>
<value>someserver</value>
</property>
<property>
<name>dbuser</name>
<value>someuser</value>
</property>
<property>
<name>dbpassword</name>
<value>somepassword</value>
</property>
</configuration>
workflow.xml
file was uploaded into /user/dm/jobs/sqoop-test-job
. Here is the content:
<?xml version="1.0" encoding="UTF-8"?>
<workflow-app xmlns="uri:oozie:workflow:0.4" name="sqoop-test">
<global>
<job-xml>/user/dm/conf/env.xml</job-xml>
</global>
<start to="get-data"/>
<action name="get-data">
<sqoop xmlns="uri:oozie:sqoop-action:0.3">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<prepare>
<delete path="${outputRootPath}"/>
</prepare>
<arg>import</arg>
<arg>--connect</arg>
<arg>jdbc:sqlserver://${dbserver};user=${dbuser};password=${dbpassword}</arg>
<arg>--query</arg>
<arg>select col1 from table where $CONDITIONS</arg>
<arg>--split-by</arg>
<arg>main_id</arg>
<arg>--target-dir</arg>
<arg>${outputRootPath}/table</arg>
<arg>-m</arg>
<arg>1</arg>
</sqoop>
<ok to="end"/>
<error to="kill"/>
</action>
<kill name="kill">
<message>Sqoop-test failed, error message[${wf:errorMessage()}]</message>
</kill>
<end name='end'/>
</workflow-app>
Now, I execute oozie workflow from command line:
sudo -u dm oozie job --oozie http://host:11000/oozie -config job-config.xml -run
Where my job-config.xml is as follows:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<configuration>
<property>
<name>nameNode</name>
<value>namenode:8020</value>
</property>
<property>
<name>jobTracker</name>
<value>jobtracker:8021</value>
</property>
<property>
<name>oozie.wf.application.path</name>
<value>/user/dm/jobs/sqoop-test-job/workflow.xml</value>
</property>
<property>
<name>outputRootPath</name>
<value>/user/dm/data/sqoop-test</value>
</property>
</configuration>
OK, you are making two big mistakes.
1. Let's start with a quick exegesis of some parts of the Oozie documentation (V4.2)
Workflow Functional Specification
- has a section 19 about Global Configuration
- has sections 3.2.x about core Action types i.e. MapReduce, Pig, Java, etc.
- the XML schema specification clearly shows the
<global>
element
- does not make any mention of Global parameters
- has its own XML schema specification, which evolves at its own pace, and is not up-to-date with the Workflow schema
In other words: the Sqoop action is a plug-in as far as the Oozie server is concerned. It does not support 100% of the "newer" functionalities, including the <global>
thing that was introduced in Workflow schema V0.4
2. You don't understand the distinction between properties and parameters -- and I don't blame you, the Oozie docs are confused and confusing.
Parameters are used by Oozie to run text substitutions in properties, in commands, etc. You define their values as literals, either at submission time with the -config
argument, or in the <parameters>
element at Workflow level. And by "literal" I mean that you cannot make reference to a parameter in another parameter. The value is just immutable text, used as-is.
Properties are Java properties passed to the jobs that Oozie starts. You can set them either at submission time with the -config
argument -- yes, it's a mess, the Oozie parser has to sort out which params have a well-known property name and which ones are just params -- or in the <global>
Workflow element -- but they will not be propagated in all "extensions", as you have discovered the hard way -- or in the <property>
Action element or inside an XML file defined with <job-xml>
element, either at global Workflow level or at local Action level.
Two things to note:
- when properties are defined multiple times with multiple (conflicting) values, there has to be a precedence rule but I'm not too sure
- properties defined explicitly inside Oozie may have their value defined dynamically, using parameters and EL functions; but properties defined inside
<job-xml>
files must be literals because Oozie does not have access to them (it just passes the file content to the HadoopConfiguration
constructor at run-time)
What does it mean for you? Well, your script tells Oozie to pass "hidden" properties to the JVM running the Sqoop job, at run-time, through a <job-xml>
.
But you were expecting Oozie to parse a list of parameters and use them, at compile time, to define some properties. That won't happen.
这篇关于OOZIE:全局job-xml中引用的文件中定义的属性在workflow.xml中不可见的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!