基于Oozie文件的协调员 [英] Oozie file based coordinator

查看:126
本文介绍了基于Oozie文件的协调员的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图创建一个基于文件的依赖关系的协调器。我的目标是协调员只有在创建指定的文件时才应该执行工作流程。如果文件没有创建,协调员应该等到创建文件。
我尝试了以下代码:

 < coordinator-app name =MY_APPfrequency =1440 start =2009-02-01T00:00Zend =2009-02-07T00:00Ztimezone =UTCxmlns =uri:oozie:coordinator:0.1> 
< datasets>
< dataset name =input1frequency =60initial-instance =2009-01-01T00:00Ztimezone =UTC>
< uri-template> hdfs:// localhost:9000 / tmp / revenue_feed / $ {YEAR} / $ {MONTH} / $ {DAY} / $ {HOUR}< / uri-template>
< done-flag> trigger.dat< / done-flag>
< / dataset>
< / datasets>
< input-events>
< data-in name =coordInput1dataset =input1>
< start-instance> $ {coord:current(-23)}< / start-instance>
< end-instance> $ {coord:current(0)}< / end-instance>
< / data-in>
< / input-events>
< action>
<工作流程>
< app-path> hdfs:// localhost:9000 / tmp / workflows< / app-path>
< / workflow>
< / action>
< / coordinator-app>

我开始了Oozie作业,它处于WAITING状态。我已经执行了脚本,它将在HDFS中指定的目录结构(hdfs:// localhost:9000 / tmp / revenue_feed / $ {YEAR} / $ {MONTH} / $ {DAY} /创建文件(trigger.dat)/ $ {HOUR})。
文件已创建,仍处于WAITING状态。



任何人都可以帮助我解决这个问题..

解决方案

我已经改变了开始日期和结束日期,现在正在工作。



coordinator.xml的工作方式是:

 < coordinator-app name =MY_APPfrequency =60start =2015-01-12T05:00Zend = 2015-01-12T08:00Ztimezone =UTCxmlns =uri:oozie:coordinator:0.1> 
< datasets>
< uri-template> hdfs:// localhost:9000 / tmp / revenue_feed / $ {YEAR} / $ {MONTH} / $ {DAY} / $ {HOUR}< / uri-template>
< done-flag> trigger.dat< / done-flag>
< / dataset>
< / datasets>
< input-events>
< data-in name =coordInput1dataset =input1>
< start-instance> $ {coord:current(-1)}< / start-instance>
< end-instance> $ {coord:current(0)}< / end-instance>
< / data-in>
< / input-events>
< action>
<工作流程>
< app-path> hdfs:// localhost:9000 / tmp / workflows< / app-path>
<配置>
<属性>
< name> property1< / name>
< value> $ {coord:dataIn('coordInput1')}< /值>
< / property>
< / configuration>
< / workflow>
< / action>
< / coordinator-app>

我观察到的一些观点是:


  1. 预期的目录结构基于我们定义的数据集的initial-instance =2015-01-12T04:02Z和frequency =30。


  2. Oozie不会在下面声明属性数据集

    < property>
    < name> property1< / name>
    < value> $ {coord:dataIn('coordInput1')}< /值>
    < / property>


  3. Oozie始终考虑GMT / UTC时区。在安排任何工作流程时,请牢记GMT并相应地进行安排。


  4. 在创建目录时,协调员作业将处于RUNNING状态,但工作流程作业在等待状态。



I'm trying to create a coordinator with a file based dependency. My target is that the coordinator should execute the workflow only if the file specified is created. In case the file was not created, the coordinator should wait until the file is created. I have tried with the following code:

<coordinator-app name="MY_APP" frequency="1440" start="2009-02-01T00:00Z" end="2009-02-07T00:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
  <datasets>
    <dataset name="input1" frequency="60" initial-instance="2009-01-01T00:00Z" timezone="UTC">
      <uri-template>hdfs://localhost:9000/tmp/revenue_feed/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
      <done-flag>trigger.dat</done-flag>
    </dataset>
  </datasets>
  <input-events>
    <data-in name="coordInput1" dataset="input1">
      <start-instance>${coord:current(-23)}</start-instance>
      <end-instance>${coord:current(0)}</end-instance>
    </data-in>
  </input-events>
  <action>
    <workflow>
      <app-path>hdfs://localhost:9000/tmp/workflows</app-path>
    </workflow>
  </action>     
</coordinator-app>

I started the Oozie job and it is in the WAITING state. I have executed the script which will create the file (trigger.dat) in the specified directory structure in HDFS (hdfs://localhost:9000/tmp/revenue_feed/${YEAR}/${MONTH}/${DAY}/${HOUR}). File got created , still the WAITING status.

Could any one help me on this..

解决方案

I have changed the start and end dates and it's working now.

The coordinator.xml working is :

<coordinator-app name="MY_APP" frequency="60" start="2015-01-12T05:00Z" end="2015-01-12T08:00Z" timezone="UTC" xmlns="uri:oozie:coordinator:0.1">
  <datasets>
    <dataset name="input1" frequency="30" initial-instance="2015-01-12T04:02Z" timezone="UTC">
      <uri-template>hdfs://localhost:9000/tmp/revenue_feed/${YEAR}/${MONTH}/${DAY}/${HOUR}</uri-template>
      <done-flag>trigger.dat</done-flag>
    </dataset>
  </datasets>
  <input-events>
    <data-in name="coordInput1" dataset="input1">
      <start-instance>${coord:current(-1)}</start-instance>
      <end-instance>${coord:current(0)}</end-instance>
    </data-in>
  </input-events>
  <action>
    <workflow>
      <app-path>hdfs://localhost:9000/tmp/workflows</app-path>
      <configuration>
        <property>
          <name>property1</name>
          <value>${coord:dataIn('coordInput1')}</value>
        </property>
      </configuration>
    </workflow>
  </action>     
</coordinator-app>

Some points I have observed are :

  1. The directory structure expected is based on initial-instance="2015-01-12T04:02Z" and frequency="30" of dataset we define.

  2. Without declaring below property dataset won't be considered by Oozie

    <property> <name>property1</name> <value>${coord:dataIn('coordInput1')}</value> </property>

  3. Oozie always considers GMT/UTC time zone. While scheduling any workflow keep GMT in mind and schedule accordingly.

  4. Till the directory is created the coordinator job will be in RUNNING state, but the workflow job will be in WAITING state.

这篇关于基于Oozie文件的协调员的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆